Mindmajix

IBM SPSS Interview Questions

IBM SPSS Interview Question And Answers:

Q.How do I convert among SAS, Stata and SPSS files?

On this page, conversions of different data formats are discussed.  In general, the strategies should work with SAS 9.*, SPSS 14+ and Stata 11.  If you have Stata 11 and you need to convert your data to other formats, you need to use the saveold command within Stata for saving the data in Stata version 10 format before you convert the data set.

To SAS To SPSS To Stata
From SAS – How do I use a SAS data file in SPSS? – How do I use a SAS data file in Stata?
From SPSS – How do I use a SPSS data file in SAS? – How do I use a SPSS data file in Stata?
From Stata – How do I use a Stata data file in SAS? – How do I use a Stata data file in SPSS?

Another way to convert data files between SAS, Stata and SPSS is to use programs such as Stat/Transfer or DBMS Copy.  For more information on Stat/Transfer, please see our Stat/Transfer page.  You can transfer the SAS version 9.*, Stata 11 and SPSS 19 files.  Stat/Transfer allows you to transfer data files to many other file formats, including Statistica, Systat, S-Plus, R, Excel, Access, Minitab, Matlab, LIMDEP and JMP.  You may need to update your copy of Stat/Transfer to be able to transfer data sets created by the latest version of the software.  To update Stat/Transfer, click on the “About” tab (in the upper right corner), and click on the “Check for Updates” pull-down menu and select “Right Now”.


Q.How can I read hierarchical data into SPSS?

Suppose that your data file has two different kinds of records, family records and person records.  How do you read the data so that the family information is included for each person?

Here is an example dataset with two kinds of records:  Family records and person records.  The data are organized such that the family record comes first and all the person records for that family follow it.  The family records are the shorter data lines and the person records are the longer ones.

06470 1 1

32161 232 0

19082 230 1

07470 1 0

11231 240 1

08470 1 0

43711 227 0

09470 1 0

40221 213 1

41162 222 0

16173 224 1

10470 1 1

30111 220 0

36222 211 1

11470 1 0

21751 217 0

33962 210 1

32143 226 1

Here are the codebooks for the family and person records.

family record:                                      person record:

column 1-5   family id                    column 1-4  person id

column 7     record type                 column 5    person number

                      (1 = family)                 column 7    record type

column 9     group                                    (2= preson)

                                                             column 8-9  age

                                                             column 11   male

The following syntax example reads and displays in the data from the two different types of records.  The file type is nested and the number in column seven indicates which record type the data belong to.  There is a separate data list for each record type.

file type nested record=7.

record type 1.

data list / famid 1-5 group 9.

record type 2.

data list / personid 1-4 pernum 5 age 8-9 male 11.

end file type.

 

begin data

06470 1 1

32161 232 0

19082 230 1

07470 1 0

11231 240 1

08470 1 0

43711 227 0

09470 1 0

40221 213 1

41162 222 0

16173 224 1

10470 1 1

30111 220 0

36222 211 1

11470 1 0

21751 217 0

33962 210 1

32143 226 1

end data.

 list.

Here is what the final dataset looks like.

FAMID GROUP PERSONID PERNUM AGE MALE

 6470   1     3216      1    32   0

 6470   1     1908      2    30   1

 7470   0     1123      1    40   1

 8470   0     4371      1    27   0

 9470   0     4022      1    13   1

 9470   0     4116      2    22   0

 9470   0     1617      3    24   1

10470   1     3011      1    20   0

10470   1     3622      2    11   1

11470   0     2175      1    17   0

11470   0     3396      2    10   1

11470   0     3214      3    26   1

Number of cases read:  12    Number of cases listed:  12


Q.How can I compare two data sets in SPSS? orHow do I check that the same data input by two people are consistently entered?

NOTE: The method shown on this page works for all versions of SPSS, but if you have SPSS version 21 or later, you can use the compare datasetscommand. Please see this page for examples.

There are times when you would like to compare two data sets to see if they are exactly the same.  For example, if two people enter the same data (double data entry), you would want to know if any discrepancies exist between the two datasets (the rationale of double data entry), and if so, where those discrepancies are. We start by reading in the two datasets, one entered by person 1 and the second by person 2.  The two data sets are identical, except that we created a missing value in the ninth row, second variable, in the first data set, and we changed the very last entry from 51 to 52 in the second data set.

After entering each data set, we need to sort the data set.  In our example, we will sort the data set on all variables, starting with the first variable in the data set.  We use the SPSS keyword all to do this.  We use this method because it is very general and will work in many situations.  (However, if you want to compare the files on only a few variables in the data set, you will need to list the variables in the same order in both sorts and on the by subcommand of the update command.)  After sorting the data set, we save it.  We do this for both data sets.

data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 . 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  51end data.sort cases by all.save outfile “D:\person1.sav”. data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 1 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  52end data.sort cases by all.save outfile “D:\person2.sav”.

Now we can use the update command to compare the two data files.  We need to use the SPSS keyword all on the by subcommand, because that is how we sorted the data sets.  Also, we use the in subcommand to create a flag variable, which we called flag1, to indicate which rows match and which rows do not match.  We use the label values command to add value labels to flag1, and finally we run a frequency on flag1.  As we can see, there are two mismatches.

update file = “D:\person1.sav”/in = flag1/file = “D:\person2.sav”/by all.exe. save outfile “D:\combo.sav”. value labels flag1 0 ‘mismatch’ 1 ‘match’.freq var = flag1.

Finally, if we look at our new data set, combo, we see that we now have 12 rows of data instead of the original 10.  A new row is added to the data set for each mismatched row, so that you can see where the mismatch is.  If there are two mismatches in a row, the row is listed only once, so you will need to compare the values for each variable to find all of the mismatches.

                                    scht                                     fla      id   female     race      ses ype      prog     read    write    socst g1    18.00      .00     3.00     2.00 pub      3.00    50.00    33.00    36.00  1   50.00      .00     2.00     2.00 pub      2.00    50.00    59.00    61.00  1   51.00     1.00     2.00     1.00 pub      2.00    42.00    36.00    39.00  1   57.00     1.00     1.00     2.00 pub      1.00    71.00    65.00    56.00  1  102.00      .00     1.00     1.00 pub      1.00    52.00    41.00    56.00  1  108.00      .00     1.00     2.00 pub      2.00    34.00    33.00    36.00  1  136.00      .00     1.00     2.00 pub      1.00    65.00    59.00    51.00  1  136.00      .00     1.00     2.00 pub      1.00    65.00    59.00    52.00  0  147.00     1.00     1.00     3.00 pub      1.00    47.00    62.00    61.00  1  153.00      .00     1.00     2.00 pub      3.00    39.00    31.00    51.00  1  160.00      .       1.00     2.00 pub      1.00    55.00    65.00    61.00  1  160.00     1.00     1.00     2.00 pub      1.00    55.00    65.00    61.00  0 Number of cases read:  12    Number of cases listed:  12


Q.How can I compare two data sets in SPSS? orHow do I check that the same data input by two people are consistently entered?

NOTE: The methods shown on this page work with SPSS versions 21 and later. If you are using an earlier version of SPSS, please see this page for examples.

There are times when you would like to compare two data sets to see if they are exactly the same.  For example, if two people enter the same data (double data entry), you would want to know if any discrepancies exist between the two datasets (the rationale of double data entry), and if so, where those discrepancies are. We start by reading in the two datasets, one entered by person 1 and the second by person 2.  The two data sets are identical, except that we created a missing value in the ninth row, second variable, in the first data set, and we changed the very last entry from 51 to 52 in the second data set.

data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 . 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  51end data.sort cases by id.save outfile “D:\person1.sav”. data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47  62  53  53  61 108 0 1 2 pub 2 34  33  41  36  36  18 0 3 2 pub 3 50  33  49  44  36 153 0 1 2 pub 3 39  31  40  39  51  50 0 2 2 pub 2 50  59  42  53  61  51 1 2 1 pub 2 42  36  42  31  39 102 0 1 1 pub 1 52  41  51  53  56  57 1 1 2 pub 1 71  65  72  66  56 160 1 1 2 pub 1 55  65  55  50  61 136 0 1 2 pub 1 65  59  70  63  52end data.sort cases by id.save outfile “D:\person2.sav”.

Now we can use the compare datasets command to compare the two data files. We start with the person2 data file open, and we will compare it to theperson1 file. To do this, we specify the person1 data file on the compdataset subcommand of the compare datasets command. The variablessubcommand is necessary, and in this example, we will use the keyword all so that all variables in the data files are compared. We will use the savesubcommand to create a new variable called mismatchflag in the person2 data set. This variable will have a value of 0 for cases that match and a value of 1 for cases that do not match. If we had any unmatched cases, they would get a value of -1. While it is easy to see which cases do not match in this tiny example data set, it might not be so easy in a larger data set. We can use the frequencies command to show us how many cases matched and did not match.

compare datasets/compdataset “D:\temp\person1.sav”/variables all/save flagmismatches = yes varname = mismatchflag. freq var = mismatchflag.

Sometimes it is helpful to have the cases that match saved to one data file and the cases that do not match saved to a different data file. In the next example, we create two new data files using the mismatchdataset and matchdataset options. The mismatchname keyword is used with themismatchdataset option to name the new dataset. Likewise, the matchname keyword is used with the matchdataset option to name the data set with the matched cases. We use the delete variables command to remove the variable mismatchflag from the person2 data file.

delete variables mismatchflag.compare datasets/compdataset “D:\temp\person1.sav”/caseid id/variables all/save flagmismatches = yes varname = mismatchflagmismatchdataset= yes mismatchname = “d:\temp\mismatch.sav”         matchdataset = yes matchname = “d:\temp\match.sav”. get file “d:\temp\mismatch.sav”.list.       id   female     race      ses schtype     prog     read    write     math  science    socst    136.00      .00     1.00     2.00 pub         1.00    65.00    59.00    70.00    63.00    52.00   160.00     1.00     1.00     2.00 pub         1.00    55.00    65.00    55.00    50.00    61.00   Number of cases read:  2    Number of cases listed:  2 get file “d:\temp\match.sav”.list.      id   female     race      ses schtype     prog     read    write     math  science    socst     18.00      .00     3.00     2.00 pub         3.00    50.00    33.00    49.00    44.00    36.00    50.00      .00     2.00     2.00 pub         2.00    50.00    59.00    42.00    53.00    61.00    51.00     1.00     2.00     1.00 pub         2.00    42.00    36.00    42.00    31.00    39.00    57.00     1.00     1.00     2.00 pub         1.00    71.00    65.00    72.00    66.00    56.00   102.00      .00     1.00     1.00 pub         1.00    52.00    41.00    51.00    53.00    56.00   108.00      .00     1.00     2.00 pub         2.00    34.00    33.00    41.00    36.00    36.00   147.00     1.00     1.00     3.00 pub         1.00    47.00    62.00    53.00    53.00    61.00   153.00      .00     1.00     2.00 pub         3.00    39.00    31.00    40.00    39.00    51.00   Number of cases read:  8    Number of cases listed:  8


Q.How can SPSS help me document my data?

The codebook command was introduced in SPSS version 17.  It provides information about the variables in a dataset, such as the type, variable labels, value labels, as well as the number of cases in each level of categorical variables and means and standard deviations of continuous variables.  This information can be as important as the data themselves, because it helps to give meaning to the data.  Also, this information can help you distinguish between two similar datasets.  

The examples below will use the hs1.sav dataset.  Let’s start by looking at the Variable View.

get file “D:\data\hsb1.sav”.

You can access the codebook command via the point-and-click interface by clicking on Analyze -> Reports -> Codebook.

Let’s consider the syntax below.  Although it may look complicated, only the command itself is necessary.  If you issue the codebook command by itself, you will get the variable information for all of the variables in the dataset; counts and percents for all categories of nominal and ordinal variables; and means, standard deviations and quartiles for scale variables.  This may be more output than you want, so you may prefer to select which variables and what information about them you would like to see.  In the example below, we have selected six variables from our dataset.  In square brackets ( [] ) after each variable name, we have indicated the measurement level.  Scale variables (AKA continuous variables) are indicates with an s, ordinal variables (AKA categorical variables) with an o, and nominal variables with an n.  The measurement level specified in the command may or may not match that shown in the Variable View.  For example, as we can see above, the variable socst has a nominal measurement; however, in the codebook command below, we have specified it as a scale variable. The type of measurement determines what will be provided in the output for the variable:  counts and percents for all categories of nominal and ordinal variables; means, standard deviations and quartiles for scale variables. 

On the varinfo subcommand, we request some of the information that we see in the Variable View.  On the fileinfo subcommand, we request information on the data file itself, such as the name of the data file, its location, the file label, any documents attached to the data file and a count of the number of cases in the dataset.  On the statistics subcommand, we request the count and percent, which gives the number of cases and percent of cases in each level of nominal and ordinal variables.  We also request the mean and standard deviation of scale variables.

codebook ses [o] prgtype write [s] science [s] socst [s] /varinfo position label type format measure valuelabels missing /fileinfo name location label documents casecount /statistics  percent mean stddev.

In the example below, we show how to get minimal output (by using the keyword none on the statistics subcommand), and ordering the output in alphabetical order (by using specifying varorder = alpha on the options subcommand).

codebook ses prgtype science socst /varinfo label type valuelabels /options varorder = alpha /statistics none.


Q.How can I convert string variables into date variables?

Sometimes date data have been entered as string variables, and these variables need to be converted into numeric variables.  Date variables are numeric variables in SPSS, and as such, they can be added, subtracted, etc.  Specifically, date variables in SPSS are the number of seconds since the beginning of the Gregorian calendar, which was October 14, 1582. 

Let’s look at an example data set below.  We see that we have date data entered as string variables in three different ways.  The examples below will show how to convert each of these into date (numeric) variables that can used in calculations.

data list list/day1 (a2) month1 (a2) year1 (a4) date2 (a12) date3 (a12).begin data.12 06 2005 06/12/2005 06-Dec-200514 05 2004 05/14/2004 05-May-200401 01 1998 01/01/1998 01-Jan-1998end data. list.day1 month1 year1 date2        date3  12   06     2005  06/12/2005   06-Dec-2005 14   05     2004  05/14/2004   05-May-2004 01   01     1998  01/01/1998   01-Jan-1998  Number of cases read:  3    Number of cases listed:  3

Example 1

In the example below, we work with the variable date3.  If your date variable is entered exactly like this, then you can use the numeric function to convert it into a numeric variable.  We use the compute command with the numeric function to create a new variable called your_date that is a numeric version of the string variable date3.  We then create a copy of your_date (called your_date1).  We use the formats command to give your_date and your_date1different formats, so that you can see what number is associated with each of the dates displayed in your_date1.  To be clear:  your_date and your_date1are the same variables formatted differently.

compute your_date = numeric(date3, date11).compute your_date1 = your_date.format your_date (f14.0) your_date1 (date12).exe. list.day1 month1 year1 date2        date3             your_date   your_date1  12   06     2005  06/12/2005   06-Dec-2005     13353206400 06-DEC-2005 14   05     2004  05/14/2004   05-May-2004     13303094400 05-MAY-2004 01   01     1998  01/01/1998   01-Jan-1998     13102992000 01-JAN-1998  Number of cases read:  3    Number of cases listed:  3

Example 2

In this example, we will work with the three variables day1month1 and year1.  First, we will make numeric versions of these variables using the computecommand with the numeric function.  Next, we use the compute command with the date.dmy function to combine the numeric variables into a single date variable.  The date.dmy function requires numeric variables as arguments, so we must make numeric versions of the string variables for use with this function.  We then use the formats command to format the new date variable (called my_date).  Note that the execute command (shorted to exe.) is needed after the formats command, or the next command will not run.  The delete variables command is used to remove the unneeded variables from the data set.

compute dn = numeric(day1, f4.0).compute mn = numeric(month1, f4.0).compute yn = numeric(year1, f4.0).exe. compute my_date = date.dmy(dn, mn, yn). formats my_date (date11).exe.delete variables dn mn yn. list.day1 month1 year1 date2        date3             your_date   your_date1     my_date  12   06     2005  06/12/2005   06-Dec-2005     13353206400 06-DEC-2005  12-JUN-2005 14   05     2004  05/14/2004   05-May-2004     13303094400 05-MAY-2004  14-MAY-2004 01   01     1998  01/01/1998   01-Jan-1998     13102992000 01-JAN-1998  01-JAN-1998  Number of cases read:  3    Number of cases listed:  3

Example 3

This example is very similar to Example 2, except that the date is contained in a single string variable, called date2.  Because the whole date is contained in the string variable date2, we need to start by breaking date2 into three parts:  month, day and year.  To start this process, we first create the three string variables (called m1d1 and y1).  Next, we populate the new string variables using the compute command with the substr function.  In the third step, we create numeric versions of the string variables using the compute command with the numeric function.  After that, the compute command with thedate.dmy function is used to combine the numeric versions of the day, month and year variables (dnmn and yn).  The date.dmy function can only take numeric variables as arguments, so we could not use the string versions of these variables.  In the last step, we format the new date variable, callednew_date, with the formats command.  We selected the adate11 format, but you could use any date format that you like.  Note that the execute command (shorted to exe.) is needed after the formats command, or the next command will not run.  We also used the delete variables command to remove the unneeded variables from data set.

string m1 (a2).string d1 (a2).string y1 (a4).exe. compute m1 = substr(date2, 1, 2).compute d1 = substr(date2, 4, 2).compute y1 = substr(date2, 7, 4). compute mn = numeric(m1, f4.0).compute dn = numeric(d1, f4.0).compute yn = numeric(y1, f4.0). compute new_date = date.dmy(dn, mn, yn). formats new_date (adate11).exe.delete variables d1 m1 y1 dn mn yn.exe.list.day1 month1 year1 date2        date3             your_date   your_date1    new_date  12   06     2005  06/12/2005   06-Dec-2005     13353206400 06-DEC-2005  06/12/2005 14   05     2004  05/14/2004   05-May-2004     13303094400 05-MAY-2004  05/14/2004 01   01     1998  01/01/1998   01-Jan-1998     13102992000 01-JAN-1998  01/01/1998  Number of cases read:  3    Number of cases listed:  3


Q.How do I create and modify string (character) variables?

There are at least two ways to create a string variable in SPSS.  In our first example, we show how to input string variables into a new data set.  In the next example, we show how to create a string variable in an existing data set.  In the last example, we will show how to removed unwanted characters from a string variable.

Example 1:  Inputting string variables into a new data set

In this example, we will enter an id number, the first and last name, age and weight for nine folks.  All of the variables will be numeric, except of course, the names. We will also save the file.

data list list / id * fname (A5) lname (A10) age wt.begin data1 “Beth” “Jones” 20 .2 “Bob” “Jensen” 23 2103 “Barb” “Andersen” 25 1254 “Andy” “Smith” 26 1605 “Al” “Peterson” 21 1906 “Ann” “Glenn” 22 1157 “Pete” “.” 29 1758 “Pam” “Wright” 21 1459 “Phil” “Brown” 29 200end data.save outfile ‘c:\names.sav’.

The (A_) after fname and lname tells SPSS that the variable(s) before that option are string variables, and they have a length of five and ten, respectively.  If you are listing only one string variable and there is one or more numeric variables listed before the string variable, you need to put an asterisk before the name of the string variable to tell SPSS that the variables listed before the asterisk are numeric variables.  Hence, the asterisks (*) after id is necessary because SPSS assumes that all variables listed before (A8) option are string variables.  The asterisk tells SPSS that all prior variables are numeric.

You may also notice that SPSS produced an error message, shown below, while reading in the data.  It was caused by the missing data value for wt in case 1.  Despite this error message, the data were read in correctly, as we can see by using the list command.  An error message was not generated for the missing value in lname in case 7 because “.” is a valid value in a string variable.  In other words, SPSS does not consider it a missing value.  We will return to this issue shortly.

>Warning # 1111>A numeric field contained no digits.  The result has been set to the>system-missing value. >Command line: 978  Current case: 1  Current splitfile group: 1>Field contents: ‘.’>Record number: 1  Starting column: 21  Record length: 21list.        ID FNAME LNAME           AGE       WT     1.00 Beth  Jones         20.00      .    2.00 Bob   Jensen        23.00   210.00    3.00 Barb  Andersen      25.00   125.00    4.00 Andy  Smith         26.00   160.00    5.00 Al    Peterson      21.00   190.00    6.00 Ann   Glenn         22.00   115.00    7.00 Pete  .             29.00   175.00    8.00 Pam   Wright        21.00   145.00    9.00 Phil  Brown         29.00   200.00 Number of cases read:  9    Number of cases listed:  9

Example 2:  Adding a string variable to an existing data set

Suppose that we would like to add a string variable called gender.  First, we need to create the new variable using the string command.  Then we will assign values to the variable.

string gender (A6).execute.

Let’s look at the frequency of a few variables to see how gender is different from the variables that we entered with the data list command.

freq var=lname wt gender /format=notable.

Statistics
  LNAME WT GENDER
N Valid 9 8 9
Missing 0 1 0

Notice that although there are no values for gender, there are also no missing values.  (This is why you can not use the nmiss function in aggregate.)  In other words, SPSS considers a blank to be a valid value for a string variable.

Now let’s assign values to gender.  We will use the compute and the if commands to do this.  Remember that while you can modify a string variable withcompute and if, you cannot create a string variable with these commands.  (However, you can create a numeric variable with the compute or the ifcommand.)  Note that the value of a string variable must always be enclosed in quote marks.

compute gender = ‘female’.execute.

Of course, not everyone in our data set is female, so we need change some of the values of gender.  If we want to make the values of gender contingent on the value of another variable, we use the if command.  In this example, we will use the vertical bars to indicate or.

if id = 2 | id = 4 | id = 5 | id = 7 | id = 9 gender = ‘male’.execute.

We can also use numeric values in string variables.  Remember that even if numeric values are used, SPSS still considers those values to be strings.

We can assign variable labels and value labels to string variables in the same way that we can assign them to numeric variables.

variable label gender ‘This is the gender of the subject’.value label gender ‘male’ ‘m’ ‘female’ ‘f’.execute.

Example 3: Combining string variables

In our current data set, the first name (called fname) and the last name (called lname) are two different variables.  Suppose that we wanted to combine them into a single variable.  To do this, we will create a new variable called name1 with a length of 10.  Next, we will use the concat function (short for “concatenate”) to combine the first and last name into a single variable.

string name1 (A10).execute. compute name1 = concat(fname, lname).execute.   list name1.NAME1 Beth JonesBob  JenseBarb AnderAndy SmithAl   PeterAnn  GlennPete .Pam  WrighPhil Brown Number of cases read:  9    Number of cases listed:  9

As you can see, the length of name1 is too short.  Although you can use the alter type command (available in SPSS versions 16 and higher) to make the variable name1 longer, we have already lost the information at the end of some of the cases (in other words, some of the letters at the end have already been cut off).  Hence, simply making name1 longer isn’t helpful.  Rather, we will need to create a new string variable (which we will call fn) with a longer length and copy name1 into fn.

string fn (A15).compute fn = concat(fname, lname).execute. list fn.   FN Beth JonesBob  JensenBarb AndersenAndy SmithAl   PetersonAnn  GlennPete .Pam  WrightPhil Brown Number of cases read:  9    Number of cases listed:  9

While this worked, it does not look exactly as we would like.  (The unequal number of spaces between the first and last name does not look good.)  Therefore, let’s create another string variable and call it fullname.  We will use the rtrim function, which will trim off any extra blanks on the right of fname, and use the concat function to combine fname, a space, and lname.

string fullname (A15).compute fullname = concat(rtrim(fname), ” “, lname).execute.list fullname.FULLNAME Beth JonesBob JensenBarb AndersenAndy SmithAl PetersonAnn GlennPete .Pam WrightPhil Brown Number of cases read:  9    Number of cases listed:  9

Example 4:  Deleting unwanted characters from a string variable

Sometimes you need to remove unwanted characters from a string variable.  For example, social security numbers are often given with hyphens in them.  The code below can be used to remove the hyphens.  First, we input a small data set.  We use the list command to ensure that the data were read in properly.  Next, we create a string variable called strvar, which has a length of nine (a9).  We use the compute command, the concat function (short for “concatenation”) and the subst function (short for “substring”) to assign the values to strvar.  Finally, we use the list command again to see the results.  The substring function is used to break apart each value of ssn.  The first number (a.k.a. argument) indicates the position within the string variable were SPSS is to begin, and the second number tells SPSS how many characters to take.  Hence, subst(ssn, 1, 3) tells SPSS to use the variable ssn, start at the first position in the variable and take three characters.  For the row of data, that would be 123.

data list list / ssn(a11).begin data.123-45-6789987-65-4321132-54-9687798-65-4213end data. list.SSN 123-45-6789987-65-4321132-54-9687798-65-4213 Number of cases read:  4    Number of cases listed:  4string strvar (a9).compute strvar = concat(substr(ssn, 1, 3), substr(ssn, 5, 2), substr(ssn, 8, 4)). list.SSN         STRVAR 123-45-6789 123456789987-65-4321 987654321132-54-9687 132549687798-65-4213 798654213 Number of cases read:  4    Number of cases listed:  4

Q.How can I see the number of missing values and patterns of missing values in my data file?

Sometimes, a data set may have “holes” in them, i.e., missing values. Some statistical procedures such as regression analysis will not work as well, or at all on data set with missing values. The observations with missing values have to be either deleted or the missing values have to be substituted in order for a statistical procedure to produce meaningful results. Thus we may want to know  the number of missing values and the distribution of those missing values so we have a better idea on what to do with the observations with missing values. Let’s look at  the following data set.

 LANDVAL  IMPROVAL    TOTVAL  SALEPRIC SALTOAPR    30000     64831     94831    118500   1.25 30000     50765     80765     93900    .   46651     18573     65224         .   1.16   45990     91402         .    184000   1.34   42394         .     40575    168000   1.43       .      3351     51102    169000   1.12   63596      2182     65778         .   1.26   56658     53806     10464    255000   1.21   51428     72451         .         .   1.18   93200         .      4321    422000   1.04   76125     78172     54297    290000   1.14       .     61934     16294    237000   1.10   65376     34458         .    286500   1.43  42400         .     57446         .    .   40800     92606     33406    168000   1.26

  1. Number of missing values vs. number of  non missing values

The first thing we are going to look at is what  the variables are  that have a lot of missing values. We just use the command frequencies with option/format=notable.

FREQUENCIES VARIABLES=landval improval totval salepric saltoapr   /FORMAT=NOTABLE  /ORDER= ANALYSIS .

So we know the number of missing values in each variable. For instance, variable salepric has four and saltoapr has two missing values. This will help us to identify variables that may have a large number of missing values and perhaps we may want exclude those from analysis.   

  1. Number of missing values in each observation and its distribution

We can also look at the distribution of missing values across observations. For example we use command count to create a  new variable cmiss counting the number of  missing values across each observation. Looking at its frequency table we know that there are four observations with no missing values,  nine observations with one missing values, one observation with two missing values and one observation with three missing values. If we are willing to substitute one missing value per observation, we will be able to reclaim nine observations back to get a valid data set that is 13/15=87% of the size of the original one. 

COUNT  cmiss = landval improval totval salepric saltoapr  (MISSING). FREQUENCIES VARIABLES=cmiss      /ORDER=  ANALYSIS .

  1. Distribution of missing values

We can also look at the patterns of  missing values. We can recode each variable into a dummy variable such that 1 is missing and 0 is nonmissing.  Then we use the aggregate command to compute the frequency for each pattern of missing data. 

RECODE  landval improval totval salepric saltoapr  (MISSING=1)  (ELSE=0)  INTO  land1  impr1  totv1  sale1  salt1 .EXECUTE .AGGREGATE  /OUTFILE=’AGGR.SAV’  /BREAK=land1 impr1 totv1 sale1 salt1  /N_BREAK=N.   File AGGR.SAV has the following variables and observations.   LAND1    IMPR1    TOTV1    SALE1    SALT1  N_BREAK      .00      .00      .00      .00      .00       4     .00      .00      .00      .00     1.00       1     .00      .00      .00     1.00      .00       2     .00      .00     1.00      .00      .00       2     .00      .00     1.00     1.00      .00       1     .00     1.00      .00      .00      .00       2     .00     1.00      .00     1.00     1.00       1    1.00      .00      .00      .00      .00       2

Now we see that there are four observations with no missing values, one observation with one missing value in variable saltoapr, two observations with missing value in variable salepric and one observation with  missing value in both variable totval and salepric, etc. If we want to delete some observations from the original data set, we have a better idea now on which observation to delete, e.g. the observation corresponding to the 7th row.


0 Responses on IBM SPSS Interview Questions"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.