| US 7,478,092 B2 | ||
| Key term extraction | ||
| Kara C. Warburton, Aurora (Canada); Arendse Bernth, Ossining, N.Y. (US); Michael C. McCord, Ossining, N.Y. (US); and David A. Walters, Rochester, Minn. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Jul. 21, 2005, as Appl. No. 11/186,601. | ||
| Prior Publication US 2007/0022115 A1, Jan. 25, 2007 | ||
| Int. Cl. G06F 7/00 (2006.01) | ||
| U.S. Cl. 707—7 [707/6; 715/234; 715/260] | 18 Claims |

| 1. A method of managing a document, the method comprising:
extracting a set of candidate terms from the document using a terminology extraction tool, each candidate term being one of:
a noun or a noun group;
filtering the set of candidate terms based on a set of general exclusion conditions, wherein the set of general exclusion
conditions includes: an exclusion condition for excluding all candidate terms that appear in a set of common terms, each common
term being one of: a noun or a noun group that is selected from the general lexicon and does not require inclusion in any
of: a glossary for the document, a terminology repository for the document, an index of terms for the document, or a set of
terms requiring pre-translation, and an exclusion condition for excluding all near duplicate candidate terms, each near duplicate
candidate term differing from another candidate term in the set of candidate terms by at least one of: a space, capitalization,
or hyphenation; and
generating at least one of: a glossary for the document, a terminology repository for the document, an index of terms for
the document, or a translated set of candidate terms based on the filtered set of candidate terms,
identifying a set of unknown terms in the document, the identifying including each near duplicate candidate term in the set
of unknown terms;
receiving a correction to one of the set of unknown terms; and incorporating the correction into the document.
|