Home  >  Blog  >   IBM SPSS

SPSS Interview Questions

Rating: 4.5
  
 
17311

If you're looking for SPSS Interview Questions & Answers for Experienced or Freshers, you are at the right place. There are a lot of opportunities from many reputed companies in the world. According to research, SPSS has a market share of about 29.5%. So, You still have the opportunity to move ahead in your career in SPSS Analytics. Mindmajix offers Advanced SPSS Interview Questions 2024 that helps you in cracking your interview & acquire a dream career as an IBM SPSS Analyst.

Want to enhance your skills in dealing with the worlds best IBM, enroll in our SPSS Training

Top SPSS Interview Questions 

1. What is SPSS?

The SPSS software offers advanced statistical analysis, text analysis, open-source extensibility, a vast library of learning algorithms, and integration with big data.

2. What are the advantages and disadvantages of SPSS?

Advantages

  • Easy to learn and use
  • Handles big datasets
  • Can save datasets into numerous file extensions
  • Facilitates import and export of different files 

Disadvantages

  • Expensive

3. What is SPSS REPLACE Function? 

SPSS REPLACE function replaces a substring in a string with a different (possibly empty) substring.

4. What are the data view and variable view in SPSS?

Data View: In Data, view we inspect our actual data.

Variable View: In Variable, we see additional information about the data where each line is a variable, and all the sections are a characteristic related with that variable.

5. How to create a scatterplot with a regression line in spss?

It can be done through two ways - one is by using the graph command and another is by using graph command. 

Related Article: SPSS Tutorial

6. How do I use SAS data files in SPSS?

If you are using SPSS variation 14 or later, you can open it as a data record.

Select File - > Open - > Data… and after that for Files of Type select the reasonable sas data record make; by then select the report from the summary and snap Open. That is all to it.

With SPSS language structure, use ‘get sas’ to peruse in a SAS data record.

get sas data='C:datastates.sas7bdat'

7. How to create a variable that stores the number of reciprocal friends?

Step1: Turning the data to long format

Step2: Merge with the original data matching the variable friend in current data. Rename the variables and save the data by id. 

Step3: Check whether focal and id are a pair of reciprocal friends. 

Step4: Aggregate the long data to a single focal per row and merge back into the original data set.

MindMajix YouTube Channel

8. How can I calculate the time at dropout?

While dealing with longitudinal data, there may be cases of participants drop out. To find out when the dropout occurs, we need to create a variable that indicates when a participant is dropping out. 

* Compute whether someone dropped out at any particular time point. 
compute v=1.
compute dropout = 0.
do repeat x = t1 to t5.
compute v = v * ~missing(x).
compute dropout = dropout + v.
end repeat.
execute.

9. How do I convert among SAS, Stata, and SPSS files?

On this page, conversions of different data formats are discussed.  In general, the strategies should work with SAS 9.*, SPSS 14+, and Stata 11.  If you have Stata 11 and you need to convert your data to other formats, you need to use the saved command within Stata for saving the data in Stata version 10 format before you convert the data set.

 To SASTo SPSSTo Stata
From SAS -How do I use a SAS data file in SPSS?-How do I use a SAS data file in Stata?
From SPSS-How do I use an SPSS data file in SAS? -How do I use an SPSS data file in Stata?
From Stata-How do I use a Stata data file in SAS?-How do I use a Stata data file in SPSS?  

Another way to convert data files between SAS, Stata, and SPSS is to use programs such as Stat/Transfer or DBMS Copy.  For more information on Stat/Transfer.  You can transfer the SAS version 9.*, Stata 11, and SPSS 19 files.  Stat/Transfer allows you to transfer data files to many other file formats, including Statistica, Systat, S-Plus, R, Excel, Access, Minitab, Matlab, LIMDEP and JMP.  You may need to update your copy of Stat/Transfer to be able to transfer data sets created by the latest version of the software.  To update Stat/Transfer, click on the “About” tab (in the upper right corner), and click on the “Check for Updates” pull-down menu and select “Right Now”.

10. How can I read hierarchical data into SPSS?

Suppose that your data file has two different kinds of records, family records, and person records.  How do you read the data so that the family information is included for each person?

Here is an example dataset with two kinds of records:  Family records and person records.  The data are organized such that the family record comes first and all the person records for that family follow it.  The family records are the shorter data lines and the personal records are the longer ones.

06470 1 1

32161 232 0

19082 230 1

07470 1 0

11231 240 1

08470 1 0

43711 227 0

09470 1 0

40221 213 1

41162 222 0

16173 224 1

10470 1 1

30111 220 0

36222 211 1

11470 1 0

21751 217 0

33962 210 1

32143 226 1

Here are the codebooks for the family and person records.

family recordperson record

column 1-5   family id

column 7     record type 

(1 = family)

column 9     group

column 1-4  person id

column 5    person number

column 7    record type

(2= person)

column 8-9  age

column 11   male

The following syntax example reads and displays in the data from the two different types of records.  The file type is nested and the number in column seven indicates which record type the data belong to. There is a separate data list for each record type.

file type nested record=7.

record type 1.

data list / famid 1-5 group 9.

record type 2.

data list / personid 1-4 pernum 5 age 8-9 male 11.

end file type.

begin data

06470 1 1

32161 232 0

19082 230 1

07470 1 0

11231 240 1

08470 1 0

43711 227 0

09470 1 0

40221 213 1

41162 222 0

16173 224 1

10470 1 1

30111 220 0

36222 211 1

11470 1 0

21751 217 0

33962 210 1

32143 226 1

end data.

list.

Here is what the final dataset looks like.

FAMIDGROUPPERSONIDPERNUMAGEMALE
6470132161320
6470119082301
7470011231401
8470043711270
9470040221131
9470041162220
9470016173241
10470130111200
10470136222111
11470021751170
11470033962101
11470032143261

Number of cases read:  12    Number of cases listed:  12

11Q. How can I compare two data sets in SPSS? or How do I check that the same data input by two people is consistently entered?

NOTE: The method shown on this page works for all versions of SPSS, but if you have SPSS version 21 or later, you can use the compare datasets command. 

There are times when you would like to compare two data sets to see if they are exactly the same.  For example, if two people enter the same data (double data entry), you would want to know if any discrepancies exist between the two datasets (the rationale of double data entry), and if so, where those discrepancies are. We start by reading in the two datasets, one entered by person 1 and the second by person 2.  The two data sets are identical, except that we created a missing value in the ninth row, the second variable, in the first data set, and we changed the very last entry from 51 to 52 in the second data set.

After entering each data set, we need to sort the data set.  In our example, we will sort the data set on all variables, starting with the first variable in the data set.  We use the SPSS keyword all to do this.  We use this method because it is very general and will work in many situations.  (However, if you want to compare the files on only a few variables in the data set, you will need to list the variables in the same order in both sorts and on the by subcommand of the update command.)  After sorting the data set, we save it.  We do this for both data sets.

data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 . 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  51end data.sort cases by all.save outfile “D:person1.sav”. data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 1 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  52end data.sort cases by all.save outfile “D:person2.sav”.

Now we can use the update command to compare the two data files.  We need to use the SPSS keyword all on the by subcommand because that is how we sorted the data sets.  Also, we use the in subcommand to create a flag variable, which we called flag1, to indicate which rows match and which rows do not match.  We use the label values command to add value labels to flag1, and finally, we run a frequency on flag1.  As we can see, there are two mismatches.

update file = “D:person1.sav”/in = flag1/file = “D:person2.sav”/by all.exe. save outfile “D:combo.sav”. value labels flag1 0 ‘mismatch’ 1 ‘match’.freq var = flag1.

Finally, if we look at our new data set, combo, we see that we now have 12 rows of data instead of the original 10.  A new row is added to the data set for each mismatched row so that you can see where the mismatch is.  If there are two mismatches in a row, the row is listed only once, so you will need to compare the values for each variable to find all of the mismatches.

                    scht             fla      id   female     race      ses ype      prog     read    write    socst g1    18.00      .00     3.00     2.00 pub      3.00    50.00    33.00    36.00  1   50.00      .00     2.00     2.00 pub      2.00    50.00    59.00    61.00  1   51.00     1.00     2.00     1.00 pub      2.00    42.00    36.00    39.00  1   57.00     1.00     1.00     2.00 pub      1.00    71.00    65.00    56.00  1  102.00      .00     1.00     1.00 pub      1.00    52.00    41.00    56.00  1  108.00      .00     1.00     2.00 pub      2.00    34.00    33.00    36.00  1  136.00      .00     1.00     2.00 pub      1.00    65.00    59.00    51.00  1  136.00      .00     1.00     2.00 pub      1.00    65.00    59.00    52.00  0  147.00     1.00     1.00     3.00 pub      1.00    47.00    62.00    61.00  1  153.00      .00     1.00     2.00 pub      3.00    39.00    31.00    51.00  1  160.00.       1.00     2.00 pub      1.00    55.00    65.00    61.00  1  160.00     1.00     1.00     2.00 pub      1.00    55.00    65.00    61.00  0 Number of cases read:  12    Number of cases listed:  12

12. How can I compare two data sets in SPSS? how do I check that the same data input by two people are consistently entered?

NOTE: The methods shown on this page work with SPSS versions 21 and later. If you are using an earlier version of SPSS.

There are times when you would like to compare two data sets to see if they are exactly the same.  For example, if two people enter the same data (double data entry), you would want to know if any discrepancies exist between the two datasets (the rationale of double data entry), and if so, where those discrepancies are. We start by reading in the two datasets, one entered by person 1 and the second by person 2.  The two data sets are identical, except that we created a missing value in the ninth row, the second variable, in the first data set, and we changed the very last entry from 51 to 52 in the second data set.

data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 . 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  51end data.sort cases by id.save outfile “D:person1.sav”. data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 1 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  52end data.sort cases by id.save outfile “D:person2.sav”.

Now we can use the compare datasets command to compare the two data files. We start with the person2 data file open, and we will compare it to the person1 file. To do this, we specify the person1 data file on the compdataset subcommand of the compare datasets command. The variables subcommand is necessary, and in this example, we will use the keyword all so that all variables in the data files are compared. We will use the save subcommand to create a new variable called mismatch flag in the person2 data set. This variable will have a value of 0 for cases that match and a value of 1 for cases that do not match. If we had any unmatched cases, they would get a value of -1. While it is easy to see which cases do not match in this tiny example data set, it might not be so easy in a larger data set. We can use the frequencies command to show us how many cases matched and did not match.

compare datasets/compdataset “D:tempperson1.sav”/variables all/save flagmismatches = yes varname = mismatchflag. freq var = mismatchflag.

Sometimes it is helpful to have the cases that match saved to one data file and the cases that do not match saved to a different data file. In the next example, we create two new data files using the mismatchdataset and match dataset options. The mismatch name keyword is used with the mismatchdataset option to name the new dataset. Likewise, the match name keyword is used with the match dataset option to name the data set with the matched cases. We use the delete variables command to remove the variable mismatch flag from the person2 data file.

delete variables mismatchflag.compare datasets/compdataset “D:tempperson1.sav”/caseid id/variables all/save flagmismatches = yes varname = mismatchflagmismatchdataset= yes mismatchname = “d:tempmismatch.sav”         matchdataset = yes matchname = “d:tempmatch.sav”. get file “d:tempmismatch.sav”.list.       id   female     race      ses schtype     prog     read    write     math  science    socst    136.00      .00     1.00     2.00 pub         1.00    65.00    59.00    70.00    63.00    52.00   160.00     1.00     1.00     2.00 pub         1.00    55.00    65.00    55.00    50.00    61.00   Number of cases read:  2    Number of cases listed:  2 get file “d:tempmatch.sav”.list.      id   female     race      ses schtype     prog     read    write     math  science    socst     18.00      .00     3.00     2.00 pub         3.00    50.00    33.00    49.00    44.00    36.00    50.00      .00     2.00     2.00 pub         2.00    50.00    59.00    42.00    53.00    61.00    51.00     1.00     2.00     1.00 pub         2.00    42.00    36.00    42.00    31.00    39.00    57.00     1.00     1.00     2.00 pub         1.00    71.00    65.00    72.00    66.00    56.00   102.00      .00     1.00     1.00 pub         1.00    52.00    41.00    51.00    53.00    56.00   108.00      .00     1.00     2.00 pub         2.00    34.00    33.00    41.00    36.00    36.00   147.00     1.00     1.00     3.00 pub         1.00    47.00    62.00    53.00    53.00    61.00   153.00      .00     1.00     2.00 pub         3.00    39.00    31.00    40.00    39.00    51.00   Number of cases read:  8    Number of cases listed:  8

13. How can SPSS help me document my data?

The codebook command was introduced in SPSS version 17.  It provides information about the variables in a dataset, such as the type, variable labels, value labels, as well as the number of cases in each level of categorical variables and means and standard deviations of continuous variables.  This information can be as important as the data themselves, because it helps to give meaning to the data.  Also, this information can help you distinguish between two similar datasets.  

The examples below will use the hs1.sav dataset.  Let’s start by looking at the Variable View.

get file “D:datahsb1.sav”.

You can access the codebook command via the point-and-click interface by clicking on Analyze -> Reports -> Codebook.

Let’s consider the syntax below.  Although it may look complicated, only the command itself is necessary.  If you issue the codebook command by itself, you will get the variable information for all of the variables in the dataset; counts and percents for all categories of nominal and ordinal variables; and means, standard deviations and quartiles for scale variables.  This may be more output than you want, so you may prefer to select which variables and what information about them you would like to see.  In the example below, we have selected six variables from our dataset.  In square brackets ( [] ) after each variable name, we have indicated the measurement level.  Scale variables (AKA continuous variables) are indicates with an s, ordinal variables (AKA categorical variables) with an o, and nominal variables with an n.  The measurement level specified in the command may or may not match that shown in the Variable View.  For example, as we can see above, the variable socst has a nominal measurement; however, in the codebook command below, we have specified it as a scale variable. The type of measurement determines what will be provided in the output for the variable:  counts and percents for all categories of nominal and ordinal variables; means, standard deviations and quartiles for scale variables. 

On the varinfo subcommand, we request some of the information that we see in the Variable View.  On the fileinfo subcommand, we request information on the data file itself, such as the name of the data file, its location, the file label, any documents attached to the data file and a count of the number of cases in the dataset.  On the statistics subcommand, we request the count and percent, which gives the number of cases and percent of cases in each level of nominal and ordinal variables.  We also request the mean and standard deviation of scale variables.

codebook ses [o] prgtype write [s] science [s] socst [s] /varinfo position label type format measure valuelabels missing /fileinfo name location label documents casecount /statistics  percent mean stddev.

In the example below, we show how to get minimal output (by using the keyword none on the statistics subcommand), and ordering the output in alphabetical order (by using specifying varorder = alpha on the options subcommand).

codebook ses prgtype science socst /varinfo label type valuelabels /options varorder = alpha /statistics none.

14. How can I convert string variables into date variables?

Sometimes date data have been entered as string variables, and these variables need to be converted into numeric variables.  Date variables are numeric variables in SPSS, and as such, they can be added, subtracted, etc.  Specifically, date variables in SPSS are the number of seconds since the beginning of the Gregorian calendar, which was October 14, 1582. 

Let’s look at an example data set below.  We see that we have date data entered as string variables in three different ways.  The examples below will show how to convert each of these into date (numeric) variables that can be used in calculations.

data list list/day1 (a2) month1 (a2) year1 (a4) date2 (a12) date3 (a12).begin data.12 06 2005 06/12/2005 06-Dec-200514 05 2004 05/14/2004 05-May-200401 01 1998 01/01/1998 01-Jan-1998end data. list.day1 month1 year1 date2        date3  12   06     2005  06/12/2005   06-Dec-2005 14   05     2004  05/14/2004   05-May-2004 01   01     1998  01/01/1998   01-Jan-1998  Number of cases read:  3    Number of cases listed:  3

Example 1

In the example below, we work with the variable date3.  If your date variable is entered exactly like this, then you can use the numeric function to convert it into a numeric variable.  We use the compute command with the numeric function to create a new variable called your_date that is a numeric version of the string variable date3.  We then create a copy of your_date (called your_date1).  We use the formats command to give your_date and your_date1 different formats, so that you can see what number is associated with each of the dates displayed in your_date1.  To be clear:  your_date and your_date1 are the same variables formatted differently.

compute your_date = numeric(date3, date11).compute your_date1 = your_date.format your_date (f14.0) your_date1 (date12).exe. list.day1 month1 year1 date2        date3             your_date   your_date1  12   06     2005  06/12/2005   06-Dec-2005     13353206400 06-DEC-2005 14   05     2004  05/14/2004   05-May-2004     13303094400 05-MAY-2004 01   01     1998  01/01/1998   01-Jan-1998     13102992000 01-JAN-1998  Number of cases read:  3    Number of cases listed:  3

Example 2

In this example, we will work with the three variables day1, month1 and year1.  First, we will make numeric versions of these variables using the compute command with the numeric function.  Next, we use the compute command with the date.dmy function to combine the numeric variables into a single date variable.  The date.dmy function requires numeric variables as arguments, so we must make numeric versions of the string variables for use with this function.  We then use the formats command to format the new date variable (called my_date).  Note that the execute command (shorted to exe.) is needed after the formats command, or the next command will not run.  The delete variables command is used to remove the unneeded variables from the data set.

compute dn = numeric(day1, f4.0).compute mn = numeric(month1, f4.0).compute yn = numeric(year1, f4.0).exe. compute my_date = date.dmy(dn, mn, yn). formats my_date (date11).exe.delete variables dn mn yn. list.day1 month1 year1 date2        date3             your_date   your_date1     my_date  12   06     2005  06/12/2005   06-Dec-2005     13353206400 06-DEC-2005  12-JUN-2005 14   05     2004  05/14/2004   05-May-2004     13303094400 05-MAY-2004  14-MAY-2004 01   01     1998  01/01/1998   01-Jan-1998     13102992000 01-JAN-1998  01-JAN-1998  Number of cases read:  3    Number of cases listed:  3

Example 3

This example is very similar to Example 2, except that the date is contained in a single string variable, called date2.  Because the whole date is contained in the string variable date2, we need to start by breaking date2 into three parts:  month, day, and year.  To start this process, we first create the three string variables (called m1, d1 and y1).  Next, we populate the new string variables using the compute command with the substr function.  In the third step, we create numeric versions of the string variables using the compute command with the numeric function.  After that, the compute command with the date.dmy function is used to combine the numeric versions of the day, month and year variables (dn, mn and yn).  The date.dmy function can only take numeric variables as arguments, so we could not use the string versions of these variables.  In the last step, we format the new date variable, called new_date, with the formats command.  We selected the adate11 format, but you could use any date format that you like.  Note that the execute command (shorted to exe.) is needed after the formats command, or the next command will not run.  We also used the delete variables command to remove the unneeded variables from data set.

string m1 (a2).string d1 (a2).string y1 (a4).exe. compute m1 = substr(date2, 1, 2).compute d1 = substr(date2, 4, 2).compute y1 = substr(date2, 7, 4). compute mn = numeric(m1, f4.0).compute dn = numeric(d1, f4.0).compute yn = numeric(y1, f4.0). compute new_date = date.dmy(dn, mn, yn). formats new_date (adate11).exe.delete variables d1 m1 y1 dn mn yn.exe.list.day1 month1 year1 date2        date3             your_date   your_date1    new_date  12   06     2005  06/12/2005   06-Dec-2005     13353206400 06-DEC-2005  06/12/2005 14   05     2004  05/14/2004   05-May-2004     13303094400 05-MAY-2004  05/14/2004 01   01     1998  01/01/1998   01-Jan-1998     13102992000 01-JAN-1998  01/01/1998  Number of cases read:  3    Number of cases listed:  3

15. How do I create and modify string (character) variables?

There are at least two ways to create a string variable in SPSS.  In our first example, we show how to input string variables into a new data set.  In the next example, we show how to create a string variable in an existing data set.  In the last example, we will show how to remove unwanted characters from a string variable.

Example 1:  Inputting string variables into a new data set

In this example, we will enter an id number, the first and last name, age, and weight for nine folks.  All of the variables will be numeric, except of course, the names. We will also save the file.

data list list / id * fname (A5) lname (A10) age wt.begin data1 “Beth” “Jones” 20 .2 “Bob” “Jensen” 23 2103 “Barb” “Andersen” 25 1254 “Andy” “Smith” 26 1605 “Al” “Peterson” 21 1906 “Ann” “Glenn” 22 1157 “Pete” “.” 29 1758 “Pam” “Wright” 21 1459 “Phil” “Brown” 29 200end data.save outfile ‘c:names.sav’.

The (A_) after fname and lname tells SPSS that the variable(s) before that option are string variables, and they have a length of five and ten, respectively.  If you are listing only one string variable and there is one or more numeric variables listed before the string variable, you need to put an asterisk before the name of the string variable to tell SPSS that the variables listed before the asterisk are numeric variables.  Hence, the asterisks (*) after id is necessary because SPSS assumes that all variables listed before (A8) option are string variables.  The asterisk tells SPSS that all prior variables are numeric.

You may also notice that SPSS produced an error message, shown below, while reading in the data.  It was caused by the missing data value for wt in case 1.  Despite this error message, the data were read in correctly, as we can see by using the list command.  An error message was not generated for the missing value in lname in case 7 because “.” is a valid value in a string variable.  In other words, SPSS does not consider it a missing value.  We will return to this issue shortly.

>Warning # 1111>A numeric field contained no digits.  The result has been set to the>system-missing value. >Command line: 978  Current case: 1  Current splitfile group: 1>Field contents: ‘.’>Record number: 1  Starting column: 21  Record length: 21list.        ID FNAME LNAME           AGE       WT     1.00 Beth  Jones         20.00      .    2.00 Bob   Jensen        23.00   210.00    3.00 Barb  Andersen      25.00   125.00    4.00 Andy  Smith         26.00   160.00    5.00 Al    Peterson      21.00   190.00    6.00 Ann   Glenn         22.00   115.00    7.00 Pete  .             29.00   175.00    8.00 Pam   Wright        21.00   145.00    9.00 Phil  Brown         29.00   200.00 Number of cases read:  9    Number of cases listed:  9

Example 2:  Adding a string variable to an existing data set

Suppose that we would like to add a string variable called gender.  First, we need to create the new variable using the string command.  Then we will assign values to the variable.

string gender (A6).execute.

Let’s look at the frequency of a few variables to see how gender is different from the variables that we entered with the data list command.

freq var=lname wt gender /format=notable.

Statistics(N) NAMEWTGENDER
Valid989
Missing010

Notice that although there are no values for gender, there are also no missing values.  (This is why you can not use the nmiss function in aggregate.)  In other words, SPSS considers a blank to be a valid value for a string variable.

Now let’s assign values to gender.  We will use the Missing and the if commands to do this.  Remember that while you can modify a string variable with compute and if, you cannot create a string variable with these commands.  (However, you can create a numeric variable with the compute or the if command.)  Note that the value of a string variable must always be enclosed in quote marks.

compute gender = ‘female’.execute.

Of course, not everyone in our data set is female, so we need to change some of the values of gender.  If we want to make the values of gender contingent on the value of another variable, we use the if command.  In this example, we will use the vertical bars to indicate or.

if id = 2 | id = 4 | id = 5 | id = 7 | id = 9 gender = ‘male’.execute.

We can also use numeric values in string variables.  Remember that even if numeric values are used, SPSS still considers those values to be strings.

We can assign variable labels and value labels to string variables in the same way that we can assign them to numeric variables.

variable label gender ‘This is the gender of the subject’.value label gender ‘male’ ‘m’ ‘female’ ‘f’.execute.

Example 3: Combining string variables

In our current data set, the first name (called fname) and the last name (called lname) are two different variables.  Suppose that we wanted to combine them into a single variable.  To do this, we will create a new variable called name1 with a length of 10.  Next, we will use the concat function (short for “concatenate”) to combine the first and last name into a single variable.

string name1 (A10).execute. compute name1 = concat(fname, lname).execute.   list name1.NAME1 Beth JonesBob  JenseBarb AnderAndy SmithAl   PeterAnn  GlennPete .Pam  WrighPhil Brown Number of cases read:  9    Number of cases listed:  9

As you can see, the length of name1 is too short.  Although you can use the alter type command (available in SPSS versions 16 and higher) to make the variable name1 longer, we have already lost the information at the end of some of the cases (in other words, some of the letters at the end have already been cut off).  Hence, simply making name1 longer isn’t helpful.  Rather, we will need to create a new string variable (which we will call fn) with a longer length and copy name1 into fn.

string fn (A15).compute fn = concat(fname, lname).execute. list fn.   FN Beth JonesBob  JensenBarb AndersenAndy SmithAl   PetersonAnn  GlennPete .Pam  WrightPhil Brown Number of cases read:  9    Number of cases listed:  9

While this worked, it does not look exactly as we would like.  (The unequal number of spaces between the first and last name does not look good.)  Therefore, let’s create another string variable and call it fullname.  We will use the rtrim function, which will trim off any extra blanks on the right of fname, and use the concat function to combine fname, a space, and lname.

string fullname (A15).compute fullname = concat(rtrim(fname), ” “, lname).execute.list fullname.FULLNAME Beth JonesBob JensenBarb AndersenAndy SmithAl PetersonAnn GlennPete .Pam WrightPhil Brown Number of cases read:  9    Number of cases listed:  9

Example 4:  Deleting unwanted characters from a string variable

Sometimes you need to remove unwanted characters from a string variable.  For example, social security numbers are often given with hyphens in them.  The code below can be used to remove the hyphens.  First, we input a small data set.  We use the list command to ensure that the data were read in properly.  Next, we create a string variable called strvar, which has a length of nine (a9).  We use the compute command, the concat function (short for “concatenation”) and the subst function (short for “substring”) to assign the values to strvar.  Finally, we use the list command again to see the results.  The substring function is used to break apart each value of ssn.  The first number (a.k.a. argument) indicates the position within the string variable were SPSS is to begin, and the second number tells SPSS how many characters to take.  Hence, subst(ssn, 1, 3) tells SPSS to use the variable ssn, start at the first position in the variable and take three characters.  For the row of data, that would be 123.

data list list / ssn(a11).begin data.123-45-6789987-65-4321132-54-9687798-65-4213end data. list.SSN 123-45-6789987-65-4321132-54-9687798-65-4213 Number of cases read:  4    Number of cases listed:  4string strvar (a9).compute strvar = concat(substr(ssn, 1, 3), substr(ssn, 5, 2), substr(ssn, 8, 4)). list.SSN         STRVAR 123-45-6789 123456789987-65-4321 987654321132-54-9687 132549687798-65-4213 798654213 Number of cases read:  4    Number of cases listed:  4

16. How can I see the number of missing values and patterns of missing values in my data file?

Sometimes, a data set may have “holes” in them, i.e., missing values. Some statistical procedures such as regression analysis will not work as well, or at all on data set with missing values. The observations with missing values have to be either deleted or the missing values have to be substituted in order for a statistical procedure to produce meaningful results. Thus we may want to know the number of missing values and the distribution of those missing values so we have a better idea on what to do with the observations with missing values. Let’s look at the following data set.

 LANDVAL  IMPROVAL    TOTVAL  SALEPRIC SALTOAPR    30000     64831     94831    118500   1.25 30000     50765     80765     93900    .   46651     18573     65224         .   1.16   45990     91402         .    184000   1.34   42394         .     40575    168000   1.43       .      3351     51102    169000   1.12   63596      2182     65778         .   1.26   56658     53806     10464    255000   1.21   51428     72451         .         .   1.18   93200         .      4321    422000   1.04   76125     78172     54297    290000   1.14       .     61934     16294    237000   1.10   65376     34458         .    286500   1.43  42400         .     57446         .    .   40800     92606     33406    168000   1.26

1. Number of missing values vs. the number of non-missing values

The first thing we are going to look at is what the variables are that have a lot of missing values. We just use the command Missingwith option/format=notable.

FREQUENCIES VARIABLES=landval improved total salepric saltoapr   /FORMAT=NOTABLE  /ORDER= ANALYSIS.

So we know the number of missing values in each variable. For instance, variable salepric has four and saltoapr has two missing values. This will help us to identify variables that may have a large number of missing values and perhaps we may want to exclude those from the analysis.   

2. Number of missing values in each observation and its distribution

We can also look at the distribution of missing values across observations. For example, we use command count to create a  new variable cmiss counting the number of missing values across each observation. Looking at its frequency table we know that there are four observations with no missing values,  nine observations with one missing values, one observation with two missing values and one observation with three missing values. If we are willing to substitute one missing value per observation, we will be able to reclaim nine observations back to get a valid data set that is 13/15=87% of the size of the original one. 

COUNT  cmiss = landval improval totval salepric saltoapr  (MISSING). FREQUENCIES VARIABLES=cmiss      /ORDER=  ANALYSIS .

Distribution of missing values

We can also look at the patterns of missing values. We can recode each variable into a dummy variable such that 1 is missing and 0 is non-missing.  Then we use the aggregate command to compute the frequency for each pattern of missing data. 

RECODE  landval improval totval salepric saltoapr  (MISSING=1)  (ELSE=0)  INTO  land1  impr1  totv1  sale1  salt1 .EXECUTE .AGGREGATE  /OUTFILE=’AGGR.SAV’  /BREAK=land1 impr1 totv1 sale1 salt1  /N_BREAK=N.   File AGGR.SAV has the following variables and observations.   LAND1    IMPR1    TOTV1    SALE1    SALT1  N_BREAK      .00      .00      .00      .00      .00       4     .00      .00      .00      .00     1.00       1     .00      .00      .00     1.00      .00       2     .00      .00     1.00      .00      .00       2     .00      .00     1.00     1.00      .00       1     .00     1.00      .00      .00      .00       2     .00     1.00      .00     1.00     1.00       1    1.00      .00      .00      .00      .00       2

Now we see that there are four observations with no missing values, one observation with one missing value in variable saltoapr, two observations with missing value in variable salepric and one observation with missing value in both variable totval and salepric, etc. If we want to delete some observations from the original data set, we have a better idea now on which observation to delete, e.g. the observation corresponding to the 7th row.

17. Compare SAS STATA & SPSS?

Each package offers its own unique strengths and weaknesses. As a whole, SAS, Stata, and SPSS form a set of tools that can be used for a wide variety of statistical analysis. With Stat/Transfer it is very easy to convert data files from one package to another in just a matter of seconds or minutes. Therefore, there can be quite an advantage to switching from one analysis package to another depending on the nature of your problem. For example, if you were performing analysis using mixed models you might choose SAS, but if you were doing logistic regression you might choose Stata, and if you were doing analysis of variance you might choose SPSS. If you are frequently performing statistical analysis, we would strongly urge you to consider making each one of these packages part of your toolkit for data analysis.

18. How to create a variable that contains the number of reciprocal friends?

When studying social networks, we might need to create a variable that contains the number of reciprocal friends for each person. We show a step-by-step example on this page using a wide data format. Click here to access the data. Here is how our data set structured:

idfriend1friend2friend3friend4friend5
4400645611556107415855898-
456114400645623456217415855898
456217164345611---
456235964271643456117345255610
556104562345611440065589871643
55898741585561045621--
596425589871643456214561174158
716435589873452596424561144006
734527164345623---
741585589845611596424562144006

Each person can nominate up to 5 friends. For example, a focal person with id = 44006 has nominated 4 friends and we want to know how many of these 4 friends have also nominated 44006, that is the number of reciprocal friends initiated by 44006.

One way to accomplish this task is to first turn the data in a long format so each focal person will have as many rows of data as the number of friends nominated. Then we merge back the original data matching focal id with a friend. At this point, we can simply identify by rows if the focal id and the friend is a reciprocal friend. Last, we aggregate the data back to get the total number of reciprocal friends.

Now let’s go through the steps.

Step 1:

Turning the data to long format.

get file=’D:workspssfriends_wide.sav’.

list.

varstocases

/make friend from friend1 to friend5

/index = i.

* list the first 15 observations to see the structure.

list /cases = from 1 to 15.
idifriend
44006145611
44006255610
44006374158
44006455898
45611144006
45611245623
45611345621
45611474158
45611555898
45621171643
45621245611
45623159642
45623271643
45623345611
45623473452

Step 2.

Merging with the original data matching the variable friend in current data with the variable id in the original data. To this end, we need to rename variables and make sure that both data sets are sorted by id.

rename variables id = focal.

rename variables friend = id.

sort cases by id(A).

dataset name long.

get file =’D:workspssfriends_wide.sav’.

sort cases by id(A).

dataset name friend_wide.

dataset activate long.

match files /file=*

 /table=’friend_wide’

 /by id.

exe.

list /cases = from 1 to 15.
focaliidfriend1friend2friend3friend4friend5
4561114400645611556107415855898-
5561034400645611556107415855898-
7164354400645611556107415855898-
7415854400645611556107415855898-
440061456114400645623456217415874158
456212456114400645623456217415874158
456233456114400645623456217415874158
556102456114400645623456217415874158
596424456114400645623456217415874158
716434456114400645623456217415874158
741582456114400645623456217415874158
456113456217164345611456217415874158
558983456217164345611---
596423456217164345611---
741584456217164345611---

What do we have here? Let’s look at the first row. Focal person 45611 has nominated 44006 as a friend and 44006 has nominated 4 friends: 45611, 55610,  74158 and 55898.  Since 45611 nominated 44006 and 44006 nominated 45611 in return, they form a reciprocal pair. So we can simply check by row if each pair of focal and id is a reciprocal friend by checking if the focal appears in the list of friends. This leads to our next step.

Step 3.

Checking if focal and id are a pair of reciprocal friends. To this end, we use the do repeat to loop through the friend list.

compute rtie = 0.

exe.

do repeat f = friend1 to friend5.

if (focal = f) rtie = 1.

end repeat.

exe.

sort cases by focal(A).

list /cases = from 1 to 15.

 

 focaliid friend1friend2friend3friend4friend5rtie
4400614561144006456234562174158558981.00
4400625561045623456114400655898716431.00
44006455898741585561045621--.00
4400637415855898456115964245621440061.00
4561114400645611556107415855898-1.00
456113456217164345611---1.00
4561124562359642716434561173452556101.00
45611555898741585561045621--.00
4561147415855898456115964245621440061.00
4562124561144006456234562174158558981.00
456211716435589873452596424561144006 .00
4562334561144006456234562174158558981.00
4562355561045623456114400655898716431.00
456231596425589871643456214561174158 .00
456232716435589873452596424561144006.00

Step 4.

Checking if focal and id are a pair of reciprocal friends. To this end, we use the do repeat to loop through the friend list.

compute rtie = 0.

exe.

do repeat f = friend1 to friend5.

if (focal = f) rtie = 1.

end repeat.

exe.

sort cases by focal(A).

list /cases = from 1 to 15.

 

 idnrtiesfriend1friend2 friend3friend4friend5
440063.0045611556107415855898-
456114.004400645623456217415855898
456211.007164345611---
456233.005964271643456117345255610
556103.00456234561144006 55898 71643

19. How can I calculate the time at dropout

Version info: Code for this page was tested in IBM SPSS 21.
When working with longitudinal data, there is often participant dropout. To examine when dropout occurs and to see if any variables predict dropout, we need to create a variable indicating when each person drops out of the study.

To start, here is a small example dataset with five-time points.

data list list

 /t1 t2 t3 t4 t5.

begin data.

5 . . . .

5 5 . . .

5 5 5 . .

5 5 5 5 .

5 5 5 5 5

. 5 . . .

5 . 5 . .

5 . 5 5 .

5 . . 5 5

end data.

Dropout is defined as the last wave of a study where there is no data for a particular person. This is different from just missing data because someone could have missing data, but if they also have non-missing data at a later wave, then they did not drop out.

In SPSS, we can use a series of logical statements and the special missing function, to determine at what wave a participant drops out of the study. Below, we do this by creating an indicator variable “v” that is 1 if someone has not yet dropped out at that wave and 0 otherwise. This is separate from if a person is simply missing data at a given wave because true dropout will be missing at all later time points too. Then we accumulate “v” overall waves in the “dropout” variable.

* compute whether someone dropped out at any particular time point.

compute v = 1.

compute dropout = 0.

do repeat x=t1 to t5.

 compute v = v * ~missing(x).

 compute dropout = dropout + v.

end repeat.

execute.

Now we have a variable with the wave each participant dropped out of the study. If we had other nonmissing variables (e.g., demographics or from questionnaires at baseline), we could use these as predictors of when someone drops out to see if dropout appears random or is related to something (e.g., in an intervention, perhaps participants in the treatment or control group are more likely to drop out).

Just to see what the variable is like, here is a histogram of the dropout.

* compute whether someone dropped out at any particular time point.

compute v = 1.

compute dropout = 0.

do repeat x=t1 to t5.

 compute v = v * ~missing(x).

 compute dropout = dropout + v.

end repeat.

execute.

 

Time at Dropout

Analyzing Data

Analyzing data when observations are missing or there is dropout can be a complex topic. There are many possibilities and techniques to try to use all available data and minimize bias from non-random dropout or missingness. Applied Missing Data Analysis by Craig Enders is a nice book for beginners to learn more about what options exist.

 Related Article: IBM SPSS Statistics 

20. How do I use a SAS data file in SPSS?

Using SPSS software:

If you are an SPSS user and you are using SPSS version 14 or later, you can simply open it as a data file, since SPSS supports SAS data files of different formats such as .sas7bdat, .sd7, .sd2, .ssd01 and .xpt. These files can be read directly into SPSS either via using the pull-down menu or via using the syntax.

Using the pull-down menus select File -> Open -> Data… and then for Files of Type select the appropriate sas data file type; then select the file from the list and click Open. That is all to it.

With SPSS syntax we can use the get sas command to read in a SAS data file.

get sas data=’C:datastates.sas7bdat’.

Using SAS software:

Sometimes, there is a need for converting a SAS file to an SPSS file outside of SPSS. For example, your colleague is an SPSS user who uses an older version of SPSS and you are a SAS user working with SAS version 9.x.  For your colleague to use the same data in SPSS that you have worked on in SAS, you can simply convert your data in SAS to an SPSS data file for your colleague.

In SAS, we can also save a SAS data file as an SPSS data file using proc export. For example, we have a SAS data set called my data in the working directory and we can do the following to convert it to SPSS called newdata3.sav. By specifying the file extension as .sav, SAS understands that we want our data file to be converted to SPSS. In the process of conversion, SAS will automatically convert the variable labels and value labels as well.

proc export data=mydata outfile= “C:datanewdata3.sav”;

run;

Stat/Transfer:

There might be situations where neither option above would work. For example, if someone has a SAS data file, works with an older version of SPSS, and does not have access to SAS 9.x. Probably the easiest solution in this type of situation is to use Stat/Transfer

 Explore SPSS Sample Resumes Download & Edit, Get Noticed by Top Employers!  

 

Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
IBM SPSS TrainingApr 27 to May 12View Details
IBM SPSS TrainingApr 30 to May 15View Details
IBM SPSS TrainingMay 04 to May 19View Details
IBM SPSS TrainingMay 07 to May 22View Details
Last updated: 02 Jan 2024
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read more
Recommended Courses

1 / 15