KWIC indexes and concordances

PATH: Janda's Home Page > Methods Workshop Home Page > Content Analysis > Programs > KWIC

KWIC Programs and Concordances

In literary usage, a concordance is an alphabetical list of all the important words of a book or author with reference to the passages in which they occur. In the early 1960s, scholars began using computers to automate the process of creating concordances. These computer-generated concordances were soon called KWIC indexes, standing for Key Word In Context. A computer program searched for key words and then printed them in alphabetical order, surrounded by the "context" in which they occur.
I have a special fondness for KWIC programs, for my first book-length publication was a computer generated keyword index to the American Political Science Review. The American Library Association named it "Outstanding Reference Book" in 1964. Since then, my students and I have used KWIC indexing in several publications.

I've illustrated the nature of a KWIC index by reproducing a page from my 1964 index to 2,614 articles published in the APSR from 1906 through 1963. The "key words" were all the words not contained on my list of 417 "stop" words--e.g., the, no, was, by, for, which, to, and so on. The computer generated 10,089 keyword lines for the 2,614 articles. Therefore, each title in the journal appeared in my Cumulative Index on an average of 3.9 times.


            
               
                  
                     Text before or after the
                     keyword
                     
                     (text after may be "wrapped
                     around")
                  
               
               
                  keywords aligned in
                  column below
               
               
                  
                     
                        
                           APSR
                           Reference:
                        
                     
                     
                        
                           Author
                        
                     
                     
                        
                           Initials&Year
                        
                     
                     
                        
                           Page

From Kenneth Janda, ed., Cumulative Index to the American Political Science Review, Volumes 1-57: 1906-1963.
(Evanston, Illinois: Northwestern University Press, 1964), p. 39.

In this example, KWIC indexing is used for information retrieval rather than content analysis. A researcher uses the index to retrieve specific articles by the key words that in their titles. The general method, however, can be applied to analyzing the content of the articles published in the American Political Science Review from 1906 to 1963.

During this period, for example, the Review published 30 articles dealing with "communism," "communist," or "communists" in countries across the world, but only two of these appeared prior to World War II. In contrast, the Review published 8 articles on "fascism" or "fascist" movements prior to the war but only one article afterward. Recalling the definition of content analysis as "a research technique for making replicable and valid inferences from texts to a context of their use," one might infer that American political science before World War II devoted more scholarly attention to fascism than communism.

Since the early 1960s, many scholars have used KWIC indexing in content analysis, and a few have created their own keyword indexing programs. You can buy content analysis programs that have a keyword indexing component, but you can also download stand-alone KWIC programs from several sites. Here are a few:

Concordances and Copra: This is a site at Georgetown University that reviews several concordance programs--some free, some not. Look at it to learn about the genre.

KWIC Concordance program: The KWIC Concordance is a corpus analytical tool for making word frequency lists, concordances, and collocation tables from electronic text files. This program offers the capability of handling markup schemes, such as COCOA, SGML, the Helsinki corpus, the Penn-Helsinki Parsed Corpus of Middle English (Phase 1) (Phase 2) etc.

KWIC Concordancer: This is available as a downloadable ZIP file from something called the Online English Network. It seems to be accompanied by good doocumentation.

Conc: This is a KWIC program for the Macintosh. I have another program available, if you're interested.