US 7,478,092 B2
Key term extraction
Kara C. Warburton, Aurora (Canada); Arendse Bernth, Ossining, N.Y. (US); Michael C. McCord, Ossining, N.Y. (US); and David A. Walters, Rochester, Minn. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Jul. 21, 2005, as Appl. No. 11/186,601.
Prior Publication US 2007/0022115 A1, Jan. 25, 2007
Int. Cl. G06F 7/00 (2006.01)
U.S. Cl. 707—7  [707/6; 715/234; 715/260] 18 Claims
OG exemplary drawing
 
1. A method of managing a document, the method comprising:
extracting a set of candidate terms from the document using a terminology extraction tool, each candidate term being one of: a noun or a noun group;
filtering the set of candidate terms based on a set of general exclusion conditions, wherein the set of general exclusion conditions includes: an exclusion condition for excluding all candidate terms that appear in a set of common terms, each common term being one of: a noun or a noun group that is selected from the general lexicon and does not require inclusion in any of: a glossary for the document, a terminology repository for the document, an index of terms for the document, or a set of terms requiring pre-translation, and an exclusion condition for excluding all near duplicate candidate terms, each near duplicate candidate term differing from another candidate term in the set of candidate terms by at least one of: a space, capitalization, or hyphenation; and
generating at least one of: a glossary for the document, a terminology repository for the document, an index of terms for the document, or a translated set of candidate terms based on the filtered set of candidate terms,
identifying a set of unknown terms in the document, the identifying including each near duplicate candidate term in the set of unknown terms;
receiving a correction to one of the set of unknown terms; and incorporating the correction into the document.