Path: janda.org/c10 > Syllabus > Outline > Topics and Readings
> Computers in Research > Recording Data--syntax format
Old Lecture 5 from Fall, 2000:

Recording Data for Computer Analysis: Syntax format

 

This page describes how to record data in SPSS on the UNIX system that we used in my statistics course last year.

What I presented in class on Friday, September 28, supersedes this information. I kept the link to this page up to illustrate the differences between the two systems.

Moreover, I wanted you to read about the old "IBM" punchcards that you may have hear about. For historical reasons, I'll speak about this in class on Monday.

Go here for a table of selected data for the 50 states and Washington, D.C.

Here is the first portion of that data set:

   NAME    BLACKPOP  PCTBLACK CLINTON BILL96 VOTES80 VOTES90 
         
01 ALABAMA     1021     25.3      41      43      9      9
02 ALASKA        22      4.0      32      33      3      3
03 ARIZONA      111      3.0      37      47      7      8
04 ARKANSAS     374     15.9      54      54      6      6
05 CALIFORN    2209      7.4      47      51     47     54
06 COLORADO     133      4.0      40      44      8      8
07 CONNECTI     274      8.3      42      52      8      8
08 DELAWARE     112     16.8      44      52      3      3
09 DIST OF      400     65.9      86      85      3      3
10 FLORIDA     1760     13.6      39      48     21     25
         
         

Learn these terms as they apply to these data

CASE

This is the unit of analysis, the basis of observation (e.g., the state)
Individuals, cities, nations, or organizations could be units of analysis.

VARIABLE

properties of the case
(e.g., the size of the black population, in thousands--000)

VALUES

scores of the properties (e.g., the 1,021,000 blacks in Alabama constitute 25.3% of the population)

The rectangular shape of this dataset is the basic format of a dataset, also called "flat format"

  • Cases down the side
  • Variables across the top
  • Values in the cells are either raw data or "coded" values

A historical note: recording data on "punchcards"

  • Old "IBM" punchcards provided for 80 columns of information (punches).
  • Two or more adjacent columns formed "fields" for information that would not fit in a single column, e.g., income.
  • Punches in these column stood for values.
  • The figure below illustrates how data in government reports can be reproduced on punchcards. [From Kenneth Janda, Data Processing: Applications to Political Research (Northwestern University Press, 1965), p. 20.]

llustrate with data on American states,

  • 10 cases (states)
  • 6 variables plus state ID code
  • Column locations are given at the top:

Note that numbers are "right-justified"--the units positions are aligned on the right side

Selecting the cases for entry into a machine readable file

  • Use table of random numbers in Kirk, pp. 700-701, instructions on pp. 250-251
  • Draw sample of 10 cases from the list of 51 states

 

Log into hardin again to record your data, using an editing program called qedit under a special menuing script called doit

doit was written by Bruce Foster.

It was named after the famous Nike slogan, "Do It!"
(Nike, you recall, was the Greek Goddess of sportswear.)

This time type doit at the unix prompt:

hardin(kjanda) 43%
doit


You will be asked to provide a name for the "syntax" file, which is a text file that contains data or commands for the SPSS program. Reply mydata

Name of SPSS syntax file?
mydata


The computer will reply by displaying this screen, which is the window for the qedit program; the phrase "new file" will disappear as you begin typing.

Your task is to enter your ten cases (without the column headings) into this editing space. Follow this format for entering data into columns:


STATE NAME BLACKPOP PCTBLACK CLINTON BILL96 VOTES80 VOTES90 Columns 1 - 8 9 - 12 13 - 16 17 - 18 19 20 21 - 22 23 -24

Observe the column locations above the data at the and record your ten cases precisely in the indicated columns.

Do not enter the sequence number (01 to 51) for the states.

You may wish to insert a row of repeated numbers, 1-9, to help locate the columns:

123456789 123456789 123456789 123456789

Note that a decimal point takes up one column.

To view editing commands available in qedit type control-k together and then h, and you'll see this screen:

Note that all these editing commands require use of the control key, which is also designated ^. Thus, control-k is equivalent to simultaneously pressing ^ and k. This notation will be used henceforth.

Exit the Help Screen by typing ^h k again. You can use the ^h k sequence at any time to toggle into and out of the Help Screen.

Below is roughly what your file should look like if you had used the first ten states in the list:

Before you exit, position the cursor on the first line and enter ^y to delete the nine of numbers, if you included one.

Exit and save the data using ^k x, which will save a file called mydata.sps in your home directory.

On leaving the editor, you will automatically return to control of doit and see these menu choices:

C10 doit - Monday 09/27/99 15:32:14
         
1. Edit syntax (qedit mydata.sps)
2. Run SPSS (spss -m < mydata.sps > mydata.lst)
3. Display results (more mydata.lst)
4. Print results
5. C10 Bulletin (more $CLASS/bull)
6. Getinfo for C10 Datasets (getinfo).
7. Change syntax file
8. Quit this menu. <----choose option 8

Enter your choice: 8

This takes you out of doit and back to the unix prompt.

hardin(kjanda) 43%
Although doit contains an option to print results from an SPSS run, you have only entered some data and haven't run anything yet. So type the following to print your set of ten cases.

hardin(kjand) 43%  lp cresap115 mydata.sps

      

request id is cresap115 (1 file)

hardin(kjanda) 44% logout

      

 Pick up the output at Cresap Room 115, with your netid printed as the banner identiifying your output. Warning: Due to deregulation, no one is in charge of the printed output. Allow at least a half hour for printing and disposal.