If you're looking for SPSS Interview Questions & Answers for Experienced or Freshers, you are at the right place. There are a lot of opportunities from many reputed companies in the world. According to research, SPSS has a market share of about 29.5%. So, You still have the opportunity to move ahead in your career in SPSS Analytics. Mindmajix offers Advanced SPSS Interview Questions 2024 that helps you in cracking your interview & acquire a dream career as an IBM SPSS Analyst.
Want to enhance your skills in dealing with the worlds best IBM, enroll in our SPSS Training |
The SPSS software offers advanced statistical analysis, text analysis, open-source extensibility, a vast library of learning algorithms, and integration with big data.
Advantages
Disadvantages
SPSS REPLACE function replaces a substring in a string with a different (possibly empty) substring.
Data View: In Data, view we inspect our actual data.
Variable View: In Variable, we see additional information about the data where each line is a variable, and all the sections are a characteristic related with that variable.
It can be done through two ways - one is by using the graph command and another is by using graph command.
Related Article: SPSS Tutorial |
If you are using SPSS variation 14 or later, you can open it as a data record.
Select File - > Open - > Data… and after that for Files of Type select the reasonable sas data record make; by then select the report from the summary and snap Open. That is all to it.
With SPSS language structure, use ‘get sas’ to peruse in a SAS data record.
get sas data='C:datastates.sas7bdat'
Step1: Turning the data to long format
Step2: Merge with the original data matching the variable friend in current data. Rename the variables and save the data by id.
Step3: Check whether focal and id are a pair of reciprocal friends.
Step4: Aggregate the long data to a single focal per row and merge back into the original data set.
While dealing with longitudinal data, there may be cases of participants drop out. To find out when the dropout occurs, we need to create a variable that indicates when a participant is dropping out.
* Compute whether someone dropped out at any particular time point.
compute v=1.
compute dropout = 0.
do repeat x = t1 to t5.
compute v = v * ~missing(x).
compute dropout = dropout + v.
end repeat.
execute.
On this page, conversions of different data formats are discussed. In general, the strategies should work with SAS 9.*, SPSS 14+, and Stata 11. If you have Stata 11 and you need to convert your data to other formats, you need to use the saved command within Stata for saving the data in Stata version 10 format before you convert the data set.
To SAS | To SPSS | To Stata | |
From SAS | -How do I use a SAS data file in SPSS? | -How do I use a SAS data file in Stata? | |
From SPSS | -How do I use an SPSS data file in SAS? | -How do I use an SPSS data file in Stata? | |
From Stata | -How do I use a Stata data file in SAS? | -How do I use a Stata data file in SPSS? |
Another way to convert data files between SAS, Stata, and SPSS is to use programs such as Stat/Transfer or DBMS Copy. For more information on Stat/Transfer. You can transfer the SAS version 9.*, Stata 11, and SPSS 19 files. Stat/Transfer allows you to transfer data files to many other file formats, including Statistica, Systat, S-Plus, R, Excel, Access, Minitab, Matlab, LIMDEP and JMP. You may need to update your copy of Stat/Transfer to be able to transfer data sets created by the latest version of the software. To update Stat/Transfer, click on the “About” tab (in the upper right corner), and click on the “Check for Updates” pull-down menu and select “Right Now”.
Suppose that your data file has two different kinds of records, family records, and person records. How do you read the data so that the family information is included for each person?
Here is an example dataset with two kinds of records: Family records and person records. The data are organized such that the family record comes first and all the person records for that family follow it. The family records are the shorter data lines and the personal records are the longer ones.
06470 1 1
32161 232 0
19082 230 1
07470 1 0
11231 240 1
08470 1 0
43711 227 0
09470 1 0
40221 213 1
41162 222 0
16173 224 1
10470 1 1
30111 220 0
36222 211 1
11470 1 0
21751 217 0
33962 210 1
32143 226 1
Here are the codebooks for the family and person records.
family record | person record |
column 1-5 family id column 7 record type (1 = family) column 9 group |
column 1-4 person id column 5 person number column 7 record type (2= person) column 8-9 age column 11 male |
The following syntax example reads and displays in the data from the two different types of records. The file type is nested and the number in column seven indicates which record type the data belong to. There is a separate data list for each record type.
file type nested record=7.
record type 1.
data list / famid 1-5 group 9.
record type 2.
data list / personid 1-4 pernum 5 age 8-9 male 11.
end file type.
begin data
06470 1 1
32161 232 0
19082 230 1
07470 1 0
11231 240 1
08470 1 0
43711 227 0
09470 1 0
40221 213 1
41162 222 0
16173 224 1
10470 1 1
30111 220 0
36222 211 1
11470 1 0
21751 217 0
33962 210 1
32143 226 1
end data.
list.
Here is what the final dataset looks like.
FAMID | GROUP | PERSONID | PERNUM | AGE | MALE |
6470 | 1 | 3216 | 1 | 32 | 0 |
6470 | 1 | 1908 | 2 | 30 | 1 |
7470 | 0 | 1123 | 1 | 40 | 1 |
8470 | 0 | 4371 | 1 | 27 | 0 |
9470 | 0 | 4022 | 1 | 13 | 1 |
9470 | 0 | 4116 | 2 | 22 | 0 |
9470 | 0 | 1617 | 3 | 24 | 1 |
10470 | 1 | 3011 | 1 | 20 | 0 |
10470 | 1 | 3622 | 2 | 11 | 1 |
11470 | 0 | 2175 | 1 | 17 | 0 |
11470 | 0 | 3396 | 2 | 10 | 1 |
11470 | 0 | 3214 | 3 | 26 | 1 |
Number of cases read: 12 Number of cases listed: 12
NOTE: The method shown on this page works for all versions of SPSS, but if you have SPSS version 21 or later, you can use the compare datasets command.
There are times when you would like to compare two data sets to see if they are exactly the same. For example, if two people enter the same data (double data entry), you would want to know if any discrepancies exist between the two datasets (the rationale of double data entry), and if so, where those discrepancies are. We start by reading in the two datasets, one entered by person 1 and the second by person 2. The two data sets are identical, except that we created a missing value in the ninth row, the second variable, in the first data set, and we changed the very last entry from 51 to 52 in the second data set.
After entering each data set, we need to sort the data set. In our example, we will sort the data set on all variables, starting with the first variable in the data set. We use the SPSS keyword all to do this. We use this method because it is very general and will work in many situations. (However, if you want to compare the files on only a few variables in the data set, you will need to list the variables in the same order in both sorts and on the by subcommand of the update command.) After sorting the data set, we save it. We do this for both data sets.
data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 . 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 51end data.sort cases by all.save outfile “D:person1.sav”. data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 1 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 52end data.sort cases by all.save outfile “D:person2.sav”.
Now we can use the update command to compare the two data files. We need to use the SPSS keyword all on the by subcommand because that is how we sorted the data sets. Also, we use the in subcommand to create a flag variable, which we called flag1, to indicate which rows match and which rows do not match. We use the label values command to add value labels to flag1, and finally, we run a frequency on flag1. As we can see, there are two mismatches.
update file = “D:person1.sav”/in = flag1/file = “D:person2.sav”/by all.exe. save outfile “D:combo.sav”. value labels flag1 0 ‘mismatch’ 1 ‘match’.freq var = flag1.
Finally, if we look at our new data set, combo, we see that we now have 12 rows of data instead of the original 10. A new row is added to the data set for each mismatched row so that you can see where the mismatch is. If there are two mismatches in a row, the row is listed only once, so you will need to compare the values for each variable to find all of the mismatches.
scht fla id female race ses ype prog read write socst g1 18.00 .00 3.00 2.00 pub 3.00 50.00 33.00 36.00 1 50.00 .00 2.00 2.00 pub 2.00 50.00 59.00 61.00 1 51.00 1.00 2.00 1.00 pub 2.00 42.00 36.00 39.00 1 57.00 1.00 1.00 2.00 pub 1.00 71.00 65.00 56.00 1 102.00 .00 1.00 1.00 pub 1.00 52.00 41.00 56.00 1 108.00 .00 1.00 2.00 pub 2.00 34.00 33.00 36.00 1 136.00 .00 1.00 2.00 pub 1.00 65.00 59.00 51.00 1 136.00 .00 1.00 2.00 pub 1.00 65.00 59.00 52.00 0 147.00 1.00 1.00 3.00 pub 1.00 47.00 62.00 61.00 1 153.00 .00 1.00 2.00 pub 3.00 39.00 31.00 51.00 1 160.00. 1.00 2.00 pub 1.00 55.00 65.00 61.00 1 160.00 1.00 1.00 2.00 pub 1.00 55.00 65.00 61.00 0 Number of cases read: 12 Number of cases listed: 12
NOTE: The methods shown on this page work with SPSS versions 21 and later. If you are using an earlier version of SPSS.
There are times when you would like to compare two data sets to see if they are exactly the same. For example, if two people enter the same data (double data entry), you would want to know if any discrepancies exist between the two datasets (the rationale of double data entry), and if so, where those discrepancies are. We start by reading in the two datasets, one entered by person 1 and the second by person 2. The two data sets are identical, except that we created a missing value in the ninth row, the second variable, in the first data set, and we changed the very last entry from 51 to 52 in the second data set.
data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 . 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 51end data.sort cases by id.save outfile “D:person1.sav”. data list list /id female race ses * schtype (A3) prog read write math science socst.begin data. 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 1 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 52end data.sort cases by id.save outfile “D:person2.sav”.
Now we can use the compare datasets command to compare the two data files. We start with the person2 data file open, and we will compare it to the person1 file. To do this, we specify the person1 data file on the compdataset subcommand of the compare datasets command. The variables subcommand is necessary, and in this example, we will use the keyword all so that all variables in the data files are compared. We will use the save subcommand to create a new variable called mismatch flag in the person2 data set. This variable will have a value of 0 for cases that match and a value of 1 for cases that do not match. If we had any unmatched cases, they would get a value of -1. While it is easy to see which cases do not match in this tiny example data set, it might not be so easy in a larger data set. We can use the frequencies command to show us how many cases matched and did not match.
compare datasets/compdataset “D:tempperson1.sav”/variables all/save flagmismatches = yes varname = mismatchflag. freq var = mismatchflag.
Sometimes it is helpful to have the cases that match saved to one data file and the cases that do not match saved to a different data file. In the next example, we create two new data files using the mismatchdataset and match dataset options. The mismatch name keyword is used with the mismatchdataset option to name the new dataset. Likewise, the match name keyword is used with the match dataset option to name the data set with the matched cases. We use the delete variables command to remove the variable mismatch flag from the person2 data file.
delete variables mismatchflag.compare datasets/compdataset “D:tempperson1.sav”/caseid id/variables all/save flagmismatches = yes varname = mismatchflagmismatchdataset= yes mismatchname = “d:tempmismatch.sav” matchdataset = yes matchname = “d:tempmatch.sav”. get file “d:tempmismatch.sav”.list. id female race ses schtype prog read write math science socst 136.00 .00 1.00 2.00 pub 1.00 65.00 59.00 70.00 63.00 52.00 160.00 1.00 1.00 2.00 pub 1.00 55.00 65.00 55.00 50.00 61.00 Number of cases read: 2 Number of cases listed: 2 get file “d:tempmatch.sav”.list. id female race ses schtype prog read write math science socst 18.00 .00 3.00 2.00 pub 3.00 50.00 33.00 49.00 44.00 36.00 50.00 .00 2.00 2.00 pub 2.00 50.00 59.00 42.00 53.00 61.00 51.00 1.00 2.00 1.00 pub 2.00 42.00 36.00 42.00 31.00 39.00 57.00 1.00 1.00 2.00 pub 1.00 71.00 65.00 72.00 66.00 56.00 102.00 .00 1.00 1.00 pub 1.00 52.00 41.00 51.00 53.00 56.00 108.00 .00 1.00 2.00 pub 2.00 34.00 33.00 41.00 36.00 36.00 147.00 1.00 1.00 3.00 pub 1.00 47.00 62.00 53.00 53.00 61.00 153.00 .00 1.00 2.00 pub 3.00 39.00 31.00 40.00 39.00 51.00 Number of cases read: 8 Number of cases listed: 8
The codebook command was introduced in SPSS version 17. It provides information about the variables in a dataset, such as the type, variable labels, value labels, as well as the number of cases in each level of categorical variables and means and standard deviations of continuous variables. This information can be as important as the data themselves, because it helps to give meaning to the data. Also, this information can help you distinguish between two similar datasets.
The examples below will use the hs1.sav dataset. Let’s start by looking at the Variable View.
get file “D:datahsb1.sav”.
You can access the codebook command via the point-and-click interface by clicking on Analyze -> Reports -> Codebook.
Let’s consider the syntax below. Although it may look complicated, only the command itself is necessary. If you issue the codebook command by itself, you will get the variable information for all of the variables in the dataset; counts and percents for all categories of nominal and ordinal variables; and means, standard deviations and quartiles for scale variables. This may be more output than you want, so you may prefer to select which variables and what information about them you would like to see. In the example below, we have selected six variables from our dataset. In square brackets ( [] ) after each variable name, we have indicated the measurement level. Scale variables (AKA continuous variables) are indicates with an s, ordinal variables (AKA categorical variables) with an o, and nominal variables with an n. The measurement level specified in the command may or may not match that shown in the Variable View. For example, as we can see above, the variable socst has a nominal measurement; however, in the codebook command below, we have specified it as a scale variable. The type of measurement determines what will be provided in the output for the variable: counts and percents for all categories of nominal and ordinal variables; means, standard deviations and quartiles for scale variables.
On the varinfo subcommand, we request some of the information that we see in the Variable View. On the fileinfo subcommand, we request information on the data file itself, such as the name of the data file, its location, the file label, any documents attached to the data file and a count of the number of cases in the dataset. On the statistics subcommand, we request the count and percent, which gives the number of cases and percent of cases in each level of nominal and ordinal variables. We also request the mean and standard deviation of scale variables.
codebook ses [o] prgtype write [s] science [s] socst [s] /varinfo position label type format measure valuelabels missing /fileinfo name location label documents casecount /statistics percent mean stddev.
In the example below, we show how to get minimal output (by using the keyword none on the statistics subcommand), and ordering the output in alphabetical order (by using specifying varorder = alpha on the options subcommand).
codebook ses prgtype science socst /varinfo label type valuelabels /options varorder = alpha /statistics none.
Sometimes date data have been entered as string variables, and these variables need to be converted into numeric variables. Date variables are numeric variables in SPSS, and as such, they can be added, subtracted, etc. Specifically, date variables in SPSS are the number of seconds since the beginning of the Gregorian calendar, which was October 14, 1582.
Let’s look at an example data set below. We see that we have date data entered as string variables in three different ways. The examples below will show how to convert each of these into date (numeric) variables that can be used in calculations.
data list list/day1 (a2) month1 (a2) year1 (a4) date2 (a12) date3 (a12).begin data.12 06 2005 06/12/2005 06-Dec-200514 05 2004 05/14/2004 05-May-200401 01 1998 01/01/1998 01-Jan-1998end data. list.day1 month1 year1 date2 date3 12 06 2005 06/12/2005 06-Dec-2005 14 05 2004 05/14/2004 05-May-2004 01 01 1998 01/01/1998 01-Jan-1998 Number of cases read: 3 Number of cases listed: 3
Example 1
In the example below, we work with the variable date3. If your date variable is entered exactly like this, then you can use the numeric function to convert it into a numeric variable. We use the compute command with the numeric function to create a new variable called your_date that is a numeric version of the string variable date3. We then create a copy of your_date (called your_date1). We use the formats command to give your_date and your_date1 different formats, so that you can see what number is associated with each of the dates displayed in your_date1. To be clear: your_date and your_date1 are the same variables formatted differently.
compute your_date = numeric(date3, date11).compute your_date1 = your_date.format your_date (f14.0) your_date1 (date12).exe. list.day1 month1 year1 date2 date3 your_date your_date1 12 06 2005 06/12/2005 06-Dec-2005 13353206400 06-DEC-2005 14 05 2004 05/14/2004 05-May-2004 13303094400 05-MAY-2004 01 01 1998 01/01/1998 01-Jan-1998 13102992000 01-JAN-1998 Number of cases read: 3 Number of cases listed: 3
Example 2
In this example, we will work with the three variables day1, month1 and year1. First, we will make numeric versions of these variables using the compute command with the numeric function. Next, we use the compute command with the date.dmy function to combine the numeric variables into a single date variable. The date.dmy function requires numeric variables as arguments, so we must make numeric versions of the string variables for use with this function. We then use the formats command to format the new date variable (called my_date). Note that the execute command (shorted to exe.) is needed after the formats command, or the next command will not run. The delete variables command is used to remove the unneeded variables from the data set.
compute dn = numeric(day1, f4.0).compute mn = numeric(month1, f4.0).compute yn = numeric(year1, f4.0).exe. compute my_date = date.dmy(dn, mn, yn). formats my_date (date11).exe.delete variables dn mn yn. list.day1 month1 year1 date2 date3 your_date your_date1 my_date 12 06 2005 06/12/2005 06-Dec-2005 13353206400 06-DEC-2005 12-JUN-2005 14 05 2004 05/14/2004 05-May-2004 13303094400 05-MAY-2004 14-MAY-2004 01 01 1998 01/01/1998 01-Jan-1998 13102992000 01-JAN-1998 01-JAN-1998 Number of cases read: 3 Number of cases listed: 3
Example 3
This example is very similar to Example 2, except that the date is contained in a single string variable, called date2. Because the whole date is contained in the string variable date2, we need to start by breaking date2 into three parts: month, day, and year. To start this process, we first create the three string variables (called m1, d1 and y1). Next, we populate the new string variables using the compute command with the substr function. In the third step, we create numeric versions of the string variables using the compute command with the numeric function. After that, the compute command with the date.dmy function is used to combine the numeric versions of the day, month and year variables (dn, mn and yn). The date.dmy function can only take numeric variables as arguments, so we could not use the string versions of these variables. In the last step, we format the new date variable, called new_date, with the formats command. We selected the adate11 format, but you could use any date format that you like. Note that the execute command (shorted to exe.) is needed after the formats command, or the next command will not run. We also used the delete variables command to remove the unneeded variables from data set.
string m1 (a2).string d1 (a2).string y1 (a4).exe. compute m1 = substr(date2, 1, 2).compute d1 = substr(date2, 4, 2).compute y1 = substr(date2, 7, 4). compute mn = numeric(m1, f4.0).compute dn = numeric(d1, f4.0).compute yn = numeric(y1, f4.0). compute new_date = date.dmy(dn, mn, yn). formats new_date (adate11).exe.delete variables d1 m1 y1 dn mn yn.exe.list.day1 month1 year1 date2 date3 your_date your_date1 new_date 12 06 2005 06/12/2005 06-Dec-2005 13353206400 06-DEC-2005 06/12/2005 14 05 2004 05/14/2004 05-May-2004 13303094400 05-MAY-2004 05/14/2004 01 01 1998 01/01/1998 01-Jan-1998 13102992000 01-JAN-1998 01/01/1998 Number of cases read: 3 Number of cases listed: 3
There are at least two ways to create a string variable in SPSS. In our first example, we show how to input string variables into a new data set. In the next example, we show how to create a string variable in an existing data set. In the last example, we will show how to remove unwanted characters from a string variable.
Example 1: Inputting string variables into a new data set
In this example, we will enter an id number, the first and last name, age, and weight for nine folks. All of the variables will be numeric, except of course, the names. We will also save the file.
data list list / id * fname (A5) lname (A10) age wt.begin data1 “Beth” “Jones” 20 .2 “Bob” “Jensen” 23 2103 “Barb” “Andersen” 25 1254 “Andy” “Smith” 26 1605 “Al” “Peterson” 21 1906 “Ann” “Glenn” 22 1157 “Pete” “.” 29 1758 “Pam” “Wright” 21 1459 “Phil” “Brown” 29 200end data.save outfile ‘c:names.sav’.
The (A_) after fname and lname tells SPSS that the variable(s) before that option are string variables, and they have a length of five and ten, respectively. If you are listing only one string variable and there is one or more numeric variables listed before the string variable, you need to put an asterisk before the name of the string variable to tell SPSS that the variables listed before the asterisk are numeric variables. Hence, the asterisks (*) after id is necessary because SPSS assumes that all variables listed before (A8) option are string variables. The asterisk tells SPSS that all prior variables are numeric.
You may also notice that SPSS produced an error message, shown below, while reading in the data. It was caused by the missing data value for wt in case 1. Despite this error message, the data were read in correctly, as we can see by using the list command. An error message was not generated for the missing value in lname in case 7 because “.” is a valid value in a string variable. In other words, SPSS does not consider it a missing value. We will return to this issue shortly.
>Warning # 1111>A numeric field contained no digits. The result has been set to the>system-missing value. >Command line: 978 Current case: 1 Current splitfile group: 1>Field contents: ‘.’>Record number: 1 Starting column: 21 Record length: 21list. ID FNAME LNAME AGE WT 1.00 Beth Jones 20.00 . 2.00 Bob Jensen 23.00 210.00 3.00 Barb Andersen 25.00 125.00 4.00 Andy Smith 26.00 160.00 5.00 Al Peterson 21.00 190.00 6.00 Ann Glenn 22.00 115.00 7.00 Pete . 29.00 175.00 8.00 Pam Wright 21.00 145.00 9.00 Phil Brown 29.00 200.00 Number of cases read: 9 Number of cases listed: 9
Example 2: Adding a string variable to an existing data set
Suppose that we would like to add a string variable called gender. First, we need to create the new variable using the string command. Then we will assign values to the variable.
string gender (A6).execute.
Let’s look at the frequency of a few variables to see how gender is different from the variables that we entered with the data list command.
freq var=lname wt gender /format=notable.
Statistics(N) | NAME | WT | GENDER |
Valid | 9 | 8 | 9 |
Missing | 0 | 1 | 0 |
Notice that although there are no values for gender, there are also no missing values. (This is why you can not use the nmiss function in aggregate.) In other words, SPSS considers a blank to be a valid value for a string variable.
Now let’s assign values to gender. We will use the Missing and the if commands to do this. Remember that while you can modify a string variable with compute and if, you cannot create a string variable with these commands. (However, you can create a numeric variable with the compute or the if command.) Note that the value of a string variable must always be enclosed in quote marks.
compute gender = ‘female’.execute.
Of course, not everyone in our data set is female, so we need to change some of the values of gender. If we want to make the values of gender contingent on the value of another variable, we use the if command. In this example, we will use the vertical bars to indicate or.
if id = 2 | id = 4 | id = 5 | id = 7 | id = 9 gender = ‘male’.execute.
We can also use numeric values in string variables. Remember that even if numeric values are used, SPSS still considers those values to be strings.
We can assign variable labels and value labels to string variables in the same way that we can assign them to numeric variables.
variable label gender ‘This is the gender of the subject’.value label gender ‘male’ ‘m’ ‘female’ ‘f’.execute.
Example 3: Combining string variables
In our current data set, the first name (called fname) and the last name (called lname) are two different variables. Suppose that we wanted to combine them into a single variable. To do this, we will create a new variable called name1 with a length of 10. Next, we will use the concat function (short for “concatenate”) to combine the first and last name into a single variable.
string name1 (A10).execute. compute name1 = concat(fname, lname).execute. list name1.NAME1 Beth JonesBob JenseBarb AnderAndy SmithAl PeterAnn GlennPete .Pam WrighPhil Brown Number of cases read: 9 Number of cases listed: 9
As you can see, the length of name1 is too short. Although you can use the alter type command (available in SPSS versions 16 and higher) to make the variable name1 longer, we have already lost the information at the end of some of the cases (in other words, some of the letters at the end have already been cut off). Hence, simply making name1 longer isn’t helpful. Rather, we will need to create a new string variable (which we will call fn) with a longer length and copy name1 into fn.
string fn (A15).compute fn = concat(fname, lname).execute. list fn. FN Beth JonesBob JensenBarb AndersenAndy SmithAl PetersonAnn GlennPete .Pam WrightPhil Brown Number of cases read: 9 Number of cases listed: 9
While this worked, it does not look exactly as we would like. (The unequal number of spaces between the first and last name does not look good.) Therefore, let’s create another string variable and call it fullname. We will use the rtrim function, which will trim off any extra blanks on the right of fname, and use the concat function to combine fname, a space, and lname.
string fullname (A15).compute fullname = concat(rtrim(fname), ” “, lname).execute.list fullname.FULLNAME Beth JonesBob JensenBarb AndersenAndy SmithAl PetersonAnn GlennPete .Pam WrightPhil Brown Number of cases read: 9 Number of cases listed: 9
Example 4: Deleting unwanted characters from a string variable
Sometimes you need to remove unwanted characters from a string variable. For example, social security numbers are often given with hyphens in them. The code below can be used to remove the hyphens. First, we input a small data set. We use the list command to ensure that the data were read in properly. Next, we create a string variable called strvar, which has a length of nine (a9). We use the compute command, the concat function (short for “concatenation”) and the subst function (short for “substring”) to assign the values to strvar. Finally, we use the list command again to see the results. The substring function is used to break apart each value of ssn. The first number (a.k.a. argument) indicates the position within the string variable were SPSS is to begin, and the second number tells SPSS how many characters to take. Hence, subst(ssn, 1, 3) tells SPSS to use the variable ssn, start at the first position in the variable and take three characters. For the row of data, that would be 123.
data list list / ssn(a11).begin data.123-45-6789987-65-4321132-54-9687798-65-4213end data. list.SSN 123-45-6789987-65-4321132-54-9687798-65-4213 Number of cases read: 4 Number of cases listed: 4string strvar (a9).compute strvar = concat(substr(ssn, 1, 3), substr(ssn, 5, 2), substr(ssn, 8, 4)). list.SSN STRVAR 123-45-6789 123456789987-65-4321 987654321132-54-9687 132549687798-65-4213 798654213 Number of cases read: 4 Number of cases listed: 4
Sometimes, a data set may have “holes” in them, i.e., missing values. Some statistical procedures such as regression analysis will not work as well, or at all on data set with missing values. The observations with missing values have to be either deleted or the missing values have to be substituted in order for a statistical procedure to produce meaningful results. Thus we may want to know the number of missing values and the distribution of those missing values so we have a better idea on what to do with the observations with missing values. Let’s look at the following data set.
LANDVAL IMPROVAL TOTVAL SALEPRIC SALTOAPR 30000 64831 94831 118500 1.25 30000 50765 80765 93900 . 46651 18573 65224 . 1.16 45990 91402 . 184000 1.34 42394 . 40575 168000 1.43 . 3351 51102 169000 1.12 63596 2182 65778 . 1.26 56658 53806 10464 255000 1.21 51428 72451 . . 1.18 93200 . 4321 422000 1.04 76125 78172 54297 290000 1.14 . 61934 16294 237000 1.10 65376 34458 . 286500 1.43 42400 . 57446 . . 40800 92606 33406 168000 1.26
1. Number of missing values vs. the number of non-missing values
The first thing we are going to look at is what the variables are that have a lot of missing values. We just use the command Missingwith option/format=notable.
FREQUENCIES VARIABLES=landval improved total salepric saltoapr /FORMAT=NOTABLE /ORDER= ANALYSIS.
So we know the number of missing values in each variable. For instance, variable salepric has four and saltoapr has two missing values. This will help us to identify variables that may have a large number of missing values and perhaps we may want to exclude those from the analysis.
2. Number of missing values in each observation and its distribution
We can also look at the distribution of missing values across observations. For example, we use command count to create a new variable cmiss counting the number of missing values across each observation. Looking at its frequency table we know that there are four observations with no missing values, nine observations with one missing values, one observation with two missing values and one observation with three missing values. If we are willing to substitute one missing value per observation, we will be able to reclaim nine observations back to get a valid data set that is 13/15=87% of the size of the original one.
COUNT cmiss = landval improval totval salepric saltoapr (MISSING). FREQUENCIES VARIABLES=cmiss /ORDER= ANALYSIS .
Distribution of missing values
We can also look at the patterns of missing values. We can recode each variable into a dummy variable such that 1 is missing and 0 is non-missing. Then we use the aggregate command to compute the frequency for each pattern of missing data.
RECODE landval improval totval salepric saltoapr (MISSING=1) (ELSE=0) INTO land1 impr1 totv1 sale1 salt1 .EXECUTE .AGGREGATE /OUTFILE=’AGGR.SAV’ /BREAK=land1 impr1 totv1 sale1 salt1 /N_BREAK=N. File AGGR.SAV has the following variables and observations. LAND1 IMPR1 TOTV1 SALE1 SALT1 N_BREAK .00 .00 .00 .00 .00 4 .00 .00 .00 .00 1.00 1 .00 .00 .00 1.00 .00 2 .00 .00 1.00 .00 .00 2 .00 .00 1.00 1.00 .00 1 .00 1.00 .00 .00 .00 2 .00 1.00 .00 1.00 1.00 1 1.00 .00 .00 .00 .00 2
Now we see that there are four observations with no missing values, one observation with one missing value in variable saltoapr, two observations with missing value in variable salepric and one observation with missing value in both variable totval and salepric, etc. If we want to delete some observations from the original data set, we have a better idea now on which observation to delete, e.g. the observation corresponding to the 7th row.
Each package offers its own unique strengths and weaknesses. As a whole, SAS, Stata, and SPSS form a set of tools that can be used for a wide variety of statistical analysis. With Stat/Transfer it is very easy to convert data files from one package to another in just a matter of seconds or minutes. Therefore, there can be quite an advantage to switching from one analysis package to another depending on the nature of your problem. For example, if you were performing analysis using mixed models you might choose SAS, but if you were doing logistic regression you might choose Stata, and if you were doing analysis of variance you might choose SPSS. If you are frequently performing statistical analysis, we would strongly urge you to consider making each one of these packages part of your toolkit for data analysis.
When studying social networks, we might need to create a variable that contains the number of reciprocal friends for each person. We show a step-by-step example on this page using a wide data format. Click here to access the data. Here is how our data set structured:
id | friend1 | friend2 | friend3 | friend4 | friend5 |
44006 | 45611 | 55610 | 74158 | 55898 | - |
45611 | 44006 | 45623 | 45621 | 74158 | 55898 |
45621 | 71643 | 45611 | - | - | - |
45623 | 59642 | 71643 | 45611 | 73452 | 55610 |
55610 | 45623 | 45611 | 44006 | 55898 | 71643 |
55898 | 74158 | 55610 | 45621 | - | - |
59642 | 55898 | 71643 | 45621 | 45611 | 74158 |
71643 | 55898 | 73452 | 59642 | 45611 | 44006 |
73452 | 71643 | 45623 | - | - | - |
74158 | 55898 | 45611 | 59642 | 45621 | 44006 |
Each person can nominate up to 5 friends. For example, a focal person with id = 44006 has nominated 4 friends and we want to know how many of these 4 friends have also nominated 44006, that is the number of reciprocal friends initiated by 44006.
One way to accomplish this task is to first turn the data in a long format so each focal person will have as many rows of data as the number of friends nominated. Then we merge back the original data matching focal id with a friend. At this point, we can simply identify by rows if the focal id and the friend is a reciprocal friend. Last, we aggregate the data back to get the total number of reciprocal friends.
Now let’s go through the steps.
Step 1:
Turning the data to long format.
get file=’D:workspssfriends_wide.sav’.
list.
varstocases
/make friend from friend1 to friend5
/index = i.
* list the first 15 observations to see the structure.
list /cases = from 1 to 15.
id | i | friend |
44006 | 1 | 45611 |
44006 | 2 | 55610 |
44006 | 3 | 74158 |
44006 | 4 | 55898 |
45611 | 1 | 44006 |
45611 | 2 | 45623 |
45611 | 3 | 45621 |
45611 | 4 | 74158 |
45611 | 5 | 55898 |
45621 | 1 | 71643 |
45621 | 2 | 45611 |
45623 | 1 | 59642 |
45623 | 2 | 71643 |
45623 | 3 | 45611 |
45623 | 4 | 73452 |
Step 2.
Merging with the original data matching the variable friend in current data with the variable id in the original data. To this end, we need to rename variables and make sure that both data sets are sorted by id.
rename variables id = focal.
rename variables friend = id.
sort cases by id(A).
dataset name long.
get file =’D:workspssfriends_wide.sav’.
sort cases by id(A).
dataset name friend_wide.
dataset activate long.
match files /file=*
/table=’friend_wide’
/by id.
exe.
list /cases = from 1 to 15.
focal | i | id | friend1 | friend2 | friend3 | friend4 | friend5 |
45611 | 1 | 44006 | 45611 | 55610 | 74158 | 55898 | - |
55610 | 3 | 44006 | 45611 | 55610 | 74158 | 55898 | - |
71643 | 5 | 44006 | 45611 | 55610 | 74158 | 55898 | - |
74158 | 5 | 44006 | 45611 | 55610 | 74158 | 55898 | - |
44006 | 1 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
45621 | 2 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
45623 | 3 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
55610 | 2 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
59642 | 4 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
71643 | 4 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
74158 | 2 | 45611 | 44006 | 45623 | 45621 | 74158 | 74158 |
45611 | 3 | 45621 | 71643 | 45611 | 45621 | 74158 | 74158 |
55898 | 3 | 45621 | 71643 | 45611 | - | - | - |
59642 | 3 | 45621 | 71643 | 45611 | - | - | - |
74158 | 4 | 45621 | 71643 | 45611 | - | - | - |
What do we have here? Let’s look at the first row. Focal person 45611 has nominated 44006 as a friend and 44006 has nominated 4 friends: 45611, 55610, 74158 and 55898. Since 45611 nominated 44006 and 44006 nominated 45611 in return, they form a reciprocal pair. So we can simply check by row if each pair of focal and id is a reciprocal friend by checking if the focal appears in the list of friends. This leads to our next step.
Step 3.
Checking if focal and id are a pair of reciprocal friends. To this end, we use the do repeat to loop through the friend list.
compute rtie = 0.
exe.
do repeat f = friend1 to friend5.
if (focal = f) rtie = 1.
end repeat.
exe.
sort cases by focal(A).
list /cases = from 1 to 15.
focal | i | id | friend1 | friend2 | friend3 | friend4 | friend5 | rtie |
44006 | 1 | 45611 | 44006 | 45623 | 45621 | 74158 | 55898 | 1.00 |
44006 | 2 | 55610 | 45623 | 45611 | 44006 | 55898 | 71643 | 1.00 |
44006 | 4 | 55898 | 74158 | 55610 | 45621 | - | - | .00 |
44006 | 3 | 74158 | 55898 | 45611 | 59642 | 45621 | 44006 | 1.00 |
45611 | 1 | 44006 | 45611 | 55610 | 74158 | 55898 | - | 1.00 |
45611 | 3 | 45621 | 71643 | 45611 | - | - | - | 1.00 |
45611 | 2 | 45623 | 59642 | 71643 | 45611 | 73452 | 55610 | 1.00 |
45611 | 5 | 55898 | 74158 | 55610 | 45621 | - | - | .00 |
45611 | 4 | 74158 | 55898 | 45611 | 59642 | 45621 | 44006 | 1.00 |
45621 | 2 | 45611 | 44006 | 45623 | 45621 | 74158 | 55898 | 1.00 |
45621 | 1 | 71643 | 55898 | 73452 | 59642 | 45611 | 44006 | .00 |
45623 | 3 | 45611 | 44006 | 45623 | 45621 | 74158 | 55898 | 1.00 |
45623 | 5 | 55610 | 45623 | 45611 | 44006 | 55898 | 71643 | 1.00 |
45623 | 1 | 59642 | 55898 | 71643 | 45621 | 45611 | 74158 | .00 |
45623 | 2 | 71643 | 55898 | 73452 | 59642 | 45611 | 44006 | .00 |
Step 4.
Checking if focal and id are a pair of reciprocal friends. To this end, we use the do repeat to loop through the friend list.
compute rtie = 0.
exe.
do repeat f = friend1 to friend5.
if (focal = f) rtie = 1.
end repeat.
exe.
sort cases by focal(A).
list /cases = from 1 to 15.
id | nrties | friend1 | friend2 | friend3 | friend4 | friend5 |
44006 | 3.00 | 45611 | 55610 | 74158 | 55898 | - |
45611 | 4.00 | 44006 | 45623 | 45621 | 74158 | 55898 |
45621 | 1.00 | 71643 | 45611 | - | - | - |
45623 | 3.00 | 59642 | 71643 | 45611 | 73452 | 55610 |
55610 | 3.00 | 45623 | 45611 | 44006 | 55898 | 71643 |
Version info: Code for this page was tested in IBM SPSS 21.
When working with longitudinal data, there is often participant dropout. To examine when dropout occurs and to see if any variables predict dropout, we need to create a variable indicating when each person drops out of the study.
To start, here is a small example dataset with five-time points.
data list list
/t1 t2 t3 t4 t5.
begin data.
5 . . . .
5 5 . . .
5 5 5 . .
5 5 5 5 .
5 5 5 5 5
. 5 . . .
5 . 5 . .
5 . 5 5 .
5 . . 5 5
end data.
Dropout is defined as the last wave of a study where there is no data for a particular person. This is different from just missing data because someone could have missing data, but if they also have non-missing data at a later wave, then they did not drop out.
In SPSS, we can use a series of logical statements and the special missing function, to determine at what wave a participant drops out of the study. Below, we do this by creating an indicator variable “v” that is 1 if someone has not yet dropped out at that wave and 0 otherwise. This is separate from if a person is simply missing data at a given wave because true dropout will be missing at all later time points too. Then we accumulate “v” overall waves in the “dropout” variable.
* compute whether someone dropped out at any particular time point.
compute v = 1.
compute dropout = 0.
do repeat x=t1 to t5.
compute v = v * ~missing(x).
compute dropout = dropout + v.
end repeat.
execute.
Now we have a variable with the wave each participant dropped out of the study. If we had other nonmissing variables (e.g., demographics or from questionnaires at baseline), we could use these as predictors of when someone drops out to see if dropout appears random or is related to something (e.g., in an intervention, perhaps participants in the treatment or control group are more likely to drop out).
Just to see what the variable is like, here is a histogram of the dropout.
* compute whether someone dropped out at any particular time point.
compute v = 1.
compute dropout = 0.
do repeat x=t1 to t5.
compute v = v * ~missing(x).
compute dropout = dropout + v.
end repeat.
execute.
Analyzing Data
Analyzing data when observations are missing or there is dropout can be a complex topic. There are many possibilities and techniques to try to use all available data and minimize bias from non-random dropout or missingness. Applied Missing Data Analysis by Craig Enders is a nice book for beginners to learn more about what options exist.
Related Article: IBM SPSS Statistics |
Using SPSS software:
If you are an SPSS user and you are using SPSS version 14 or later, you can simply open it as a data file, since SPSS supports SAS data files of different formats such as .sas7bdat, .sd7, .sd2, .ssd01 and .xpt. These files can be read directly into SPSS either via using the pull-down menu or via using the syntax.
Using the pull-down menus select File -> Open -> Data… and then for Files of Type select the appropriate sas data file type; then select the file from the list and click Open. That is all to it.
With SPSS syntax we can use the get sas command to read in a SAS data file.
get sas data=’C:datastates.sas7bdat’.
Using SAS software:
Sometimes, there is a need for converting a SAS file to an SPSS file outside of SPSS. For example, your colleague is an SPSS user who uses an older version of SPSS and you are a SAS user working with SAS version 9.x. For your colleague to use the same data in SPSS that you have worked on in SAS, you can simply convert your data in SAS to an SPSS data file for your colleague.
In SAS, we can also save a SAS data file as an SPSS data file using proc export. For example, we have a SAS data set called my data in the working directory and we can do the following to convert it to SPSS called newdata3.sav. By specifying the file extension as .sav, SAS understands that we want our data file to be converted to SPSS. In the process of conversion, SAS will automatically convert the variable labels and value labels as well.
proc export data=mydata outfile= “C:datanewdata3.sav”;
run;
Stat/Transfer:
There might be situations where neither option above would work. For example, if someone has a SAS data file, works with an older version of SPSS, and does not have access to SAS 9.x. Probably the easiest solution in this type of situation is to use Stat/Transfer
Explore SPSS Sample Resumes Download & Edit, Get Noticed by Top Employers! |
Name | Dates | |
---|---|---|
IBM SPSS Training | Nov 02 to Nov 17 | View Details |
IBM SPSS Training | Nov 05 to Nov 20 | View Details |
IBM SPSS Training | Nov 09 to Nov 24 | View Details |
IBM SPSS Training | Nov 12 to Nov 27 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.