KWIC
Programs and Concordances
|
In literary usage, a concordance is an
alphabetical list of all the important words of a book or
author with reference to the passages in which they
occur. In the early 1960s, scholars began using computers
to automate the process of creating concordances. These
computer-generated concordances were soon called KWIC
indexes, standing for Key Word
In Context. A computer program searched for
key words and then printed them in alphabetical order,
surrounded by the "context" in which they occur.
I have a special fondness for KWIC programs, for my
first book-length publication was a computer generated
keyword index to the American Political Science
Review. The American Library Association named it
"Outstanding Reference Book" in 1964. Since then, my
students and I have used KWIC indexing in several
publications.
I've illustrated the nature of a KWIC index by
reproducing a page from my 1964 index to 2,614 articles
published in the APSR from 1906 through 1963. The
"key words" were all the words not contained on my
list of 417 "stop" words--e.g., the, no, was, by, for,
which, to, and so on. The computer generated 10,089
keyword lines for the 2,614 articles. Therefore, each
title in the journal appeared in my Cumulative
Index on an average of 3.9 times.
|
- Text before or after the
keyword
- (text after may be "wrapped
around")
|
keywords aligned in
column below
|
APSR
Reference:
|
Author
|
Initials&Year
|
Page
|
|
From Kenneth Janda, ed., Cumulative
Index to the American Political Science Review, Volumes
1-57: 1906-1963.
(Evanston, Illinois: Northwestern University Press, 1964),
p. 39.
|
In this example, KWIC indexing is used for information
retrieval rather than content analysis. A
researcher uses the index to retrieve specific articles by
the key words that in their titles. The general method,
however, can be applied to analyzing the content of
the articles published in the American Political Science
Review from 1906 to 1963.
During this period, for example, the Review
published 30 articles dealing with "communism," "communist,"
or "communists" in countries across the world, but only two
of these appeared prior to World War II. In contrast, the
Review published 8 articles on "fascism" or "fascist"
movements prior to the war but only one article afterward.
Recalling the definition of content analysis as "a
research technique for making replicable and valid
inferences from texts to a context of their use," one might
infer that American political science before World
War II devoted more scholarly attention to fascism than
communism.
Since the early 1960s, many scholars have used KWIC
indexing in content analysis, and a few have created their
own keyword indexing programs. You can buy content analysis
programs that have a keyword indexing component, but you can
also download stand-alone KWIC programs from several sites.
Here are a few:
|
- Concordances
and Copra
- This is a site at Georgetown University that reviews
several concordance programs--some free, some not. Look
at it to learn about the genre.
|
- KWIC
Concordance program
- The KWIC Concordance is a corpus analytical tool
for making word frequency lists, concordances, and
collocation tables from electronic text files. This
program offers the capability of handling markup schemes,
such as COCOA, SGML, the Helsinki corpus, the
Penn-Helsinki Parsed Corpus of Middle English (Phase 1)
(Phase 2) etc.
|
- KWIC
Concordancer
- This is available as a downloadable ZIP file from
something called the Online English Network. It seems to
be accompanied by good doocumentation.
|
- Conc
- This is a KWIC program for the Macintosh. I have
another program available, if you're interested.
|