OpenCyc.org Homepage Dictionary Assistant Help

E-Mail Comments to: opencyc-doc@cyc.com
Copyright© 2000-2002 Cycorp. All rights reserved.

1. Introduction

Traditionally at Cycorp, the process of creating new terms (i.e., constants or narts) involved work by two separate groups. First, the concept is created along definitional assertions and comments by someone in the Ontological Engineering (OE) department. Afterwards, lexical mappings are added by some in the Natural Language (NL) group. These two steps have been separated because sometimes adding these lexical mappings, which we call lexifications, can involve linguistic considerations that OE'rs might not have in general. The Dictionary Assistant 1 shields the user from the technical considerations by explaining the choices in a nontechnical manner. It is designed to let whoever creates a term also handle the lexical mappings, thus streamlining the overall term-creation process.

This manual is organized as follows: Section 2 explains how to start up the tool. Section 3 provides details on the specific questions that the assistant asks. Finally, section 4 for a sample interaction using the tool.

   
2. Starting the tool

The Dictionary Assistant is available from the Tools menu, but it is usually easier to invoke it from the Cyc Browser. For the current Cyc term, the assistant can be started by first displaying the Lexical Info page and then clicking the [Lexify] link at the bottom of the page. (The [Lexify] link will also be shown on the browser for any unlexified term.) This will bring up the tool start-up page with the term and a default phrase for referring to the term, based on the term's label. For example, the phrase for AdHocUnit-Military will be "ad hoc unit - military".

Tips on selecting the phrase will be shown in a 'Tip:' section. In addition, the term documentation and definitional information will be shown in the bottom of the page to help with the phrase selection. After changing the phrase (e.g., "ad hoc military unit"), the user should select OK to proceed with the lexification. This will lead to a series of non-technical questions that the user needs to answer to complete the lexification. Each question is shown on a separate page that usually also includes a tips section at the bottom providing more information on making the decision.

   
3. Lexification steps

The Dictionary Assistant guides the user in making the decisions necessary for proper lexification as terms. It presents the user with simple nontechnical questions in order to determine the parameters needed for making the lexical assertions. This section provides details on the questions that the Dictionary Assistant needs to determined. These questions determine the lexification parameters needed for the underlying assertions.

The questions are organized into four groups. There is one group common to all lexifications (section 3.1); and, there are separate groups for each of the main types of lexical assertions supported: proper-name mappings (section 3.2), denotational mappings (section 3.3), and lexical assertions for paraphrasing relations (section 3.4).

   
3.1 General Lexification

For all terms to be lexified, the following options need to be determined.

Term The Cyc concept to be lexified
Phrase A word or phrase serving as a label for the term
Is proper noun? Whether the phrase is a proper name
Is preferred? Whether the phrase the most suitable form of reference

The first two (Term and Phrase) are specified on the initial screen. An error message will be displayed if the term is invalid or if the phrase if empty. Once the term is known, it's comments and definition assertions will be shown at the bottom of the screen, which can be helpful in selecting a phrase for the lexical mapping.

The next option ( Is proper noun?) determines whether the lexical mappings should use proper-name type assertions (e.g., #$nameString) versus one of the other semantic assertions (e.g., #$multiWordString). This often can be determined based on the type of type of term (e.g., instances of #$Individual generally use proper-name lexical mappings). But the question will be asked regardless to avoid overgeneralization.

The last general option ( Is preferred?) determines whether the system should use the given phrase whenever referring to the term. It is asked at the end of the lexification process, so that the phrasing reflects the syntactic information that might have been determined from the other questions (see section 3.3 below).

   
3.2 Proper Name Lexification

Proper name lexification involves determining whether one of the specialized proper-name predicates should be used, as appropriate for the type of term being lexified. For example, for type of organisms (i.e., #$OrganismClassificationType), the predicate for scientific names (i.e.,#$scientificName) would be appropriate if the phrase is the formal Latin name.

In addition, for certain types of terms, additional proper-name assertions might be desired. For example, if a full name is given for a person, it would be useful for the system to know the components of the name (e.g., first, middle, and last). Similarly, if an long name is entered, it might be useful for the user to enter an abbreviation. In both cases, the system will make a guess at the most likely values and then have the user verify the information.

   
3.3 Denotational Lexification

``Denotational'' lexification involves selecting options needed for a #$denotation or one of the multi-word lexification predicates (e.g., #$multiWordString). Unlike proper-name mappings, which usually just have a single form (i.e., the phrase itself), denotational mappings can involve several variations of the phrase. For example, since the verb ``gamble'' maps into #$Gambling, so do the verb forms ``gambled' and ``gambling''. The variations usually occurs only for a single word in the phrase, which can be viewed as being qualified by the other words (i.e., the chief or head word).

Thus the relevant options for denotational mappings are as follows, (where 'headword' refers to the word that is qualified):

Headword position Position in the phrase of the headword (i.e., the word that determines the grammatic function of the entire phrase)
Headword part of speech Grammatical part of speech of the headword (see #$SpeechPart).

Often selecting between the simple noun and mass noun part of speech choices (see #$SimpleNoun and #$MassNoun) is not straightforward. For tips on making this decision, see the following sub-section, Simple versus Mass Nouns.

After the choices have been made, there will be a verification step in which all the variations for the phrase will be generated based on the position of the headword and it's part of speech. This serves as an important diagnostic as to whether the previous choices make sense for the given mapping.

Simple versus Mass Nouns

In short, simple nouns are countable, whereas mass nouns are uncountable. To be more precise (borrowing definitions from Merriam-Webster's International Dictionary):

A simple noun is a noun (as bean or sheet) that forms a plural and is used with a numeral, with words such as "many" or "few", or with the indefinite article "a" or "an"

A mass noun is a noun (as sand or water) that characteristically denotes a homogeneous substance or a concept without subdivisions and that is preceded in indefinite singular constructions by "some".


Distinguishing between Simple versus Mass nouns.

Types of usages for Simple Nouns        Examples
- I have three BLAHs                    I have three dogs.
- How many BLAHs do you want?           How many cookies do you want?
- I want a BLAH.                        I want a magazine.
- He has a good collection of BLAHs.    He has a big collection of pins.
- BLAHs are required.                   Needles are required.

Types of usages for Mass Nouns          Examples
- I have much BLAH.                     I have much tea.
- How much BLAH do you want?            How much sugar do you want?
- I want some BLAH.                     I want some pork.
- He has a bad case of BLAH.            He has a bad case of diphtheria.
- BLAH is required.                     Information is required

Typical cases of Simple and Mass nouns

Typical cases of Simple Nouns           Examples
Animals                                 Dog, Cat
Works (of art)                          Symphonies, Novel
Solid Objects                           Rock, Frame
Specific events                         Earthquake, Massacre
Objects in general                      Magazine, Cookie

Typical cases of Mass Nouns             Examples
Categories                              Furniture, Pottery, Poetry
Chemicals & Compounds                   Cerium, Nitroglycerine
Diseases                                Leukemia, Influenza
Emotions                                Delight, Horror
Fields of Study                         Pragmatics, Anthropology
General Conditions                      Malnutrition, Stress
Liquids                                 Tequila, Water
Meat Types                              Pork, Poultry
Music Types                             Jazz, Rap
Processes                               Circulation, Homework
Stuff in general                        Acrylic, Porcelain

Notes

1. In general, nouns tend to be simple nouns.

2. Often a mass nouns appear to occur in a countable usage, but usually
there an implicit mass usage intended.

   Simple?: I bought several acrylics.
   Mass: I bought several pieces of acrylics.

3. A few words fall under both categories. For instance "egg" has both
simple and mass noun usages:

   Simple:
   - I have three eggs
   - How many eggs do you want?

   Mass:
   - I have much egg on my face.
   - Do you want some egg on yours?

However, generally one usage is preferred over the other, such as the
simple noun sense of egg.

   
3.4 Relation Lexification

During lexification of relation terms, the emphasis is not on how the term itself is referred to but is instead on how assertions made using the term can be paraphrased. This is done by asking the user for a sample sentence or phrase using the relation. Then the correspondence of words in the phrase to the relation's arguments are established. Since the relation paraphrasing also might involve variations based on the argument, the parts of speech are checked as well.

The relevant options for relational mapping thus are as follows:

Sample usage Example sentence or phrase for the relation
Argument positions Positions in the phrase for each of the relation's arguments
Argument parts of speech Grammatical part of speech for each of the arguments (generally nouns)

   
4. Sample Interaction

This sections shows a complete interaction using the Dictionary Assitant. Each subsection represents a separate question to be answered, showing the result after the user has entered any input.

4.1 Dictionary Assistant start-up page

                             Dictionary Assistant

           Specify the term and phrase for the lexical mapping

   Term :   AdHocUnit-Military________________________________
   Phrase : ad hoc unit - military______________________________________

   Reset OK

   Tip:

   Choose a phrase that clearly distinguishes the Cyc term in question,
   yet reflects common ways of referring to the term. However, it is
   important to avoid being overly specific as well as being overly
   general.

   For example, for the term 'MakingCoffee-TheBeverage', the phrase
   "making coffee beverage" is overly precise. Instead, "making coffee"
   is preferred since "coffee" in this case only refers to the beverage
   and not for instance to the beans or to grinds. In contrast, for the
   term 'Paint-Dry', "paint" would be overly general, since this can
   also refer to liquid paint as well. Instead "dry paint" would be
   preferrable.

   If you want to use more than one word or phrase to refer to the term,
   complete the wizard for the current term and then begin again with
   the same term but with a different word or phrase.

   Note that for relational terms, the phrase is just used to refer to
   the term itself. It is not the phrasing for paraphrases, which is
   specified in a subsequent form. For example, "liking the experience
   of" would be a good phrase for 'likesSensorially'.
     ________________________________________________________________

   AdHocUnit-Military documentation :
   Mt : USOrderOfBattleMt
   comment : [Def] "The collection of all military units constituted on
   an ad hoc basis."

   Collection : AdHocUnit-Military
   Arg : 1
   Mt : MilitaryForceStructureMt

   isa : [Mon] ConventionalClassificationType [Mon] ExistingObjectType

   genls : [Mon] ModernMilitaryOrganization

   Lexical Info for AdHocUnit-Military:

   Default generated phrase: "ad hoc unit military"
   (#$termStrings-GuessedFromName)

   Mt: TemporaryLexicalAssertions-GuessedMt
   [Def] (termStrings-GuessedFromName AdHocUnit-Military "ad hoc unit
   military")

4.2 Proper-name determination page

   Checking whether the phrase is a proper name

   Is the phrase "ad hoc military unit" a name?
   Yes No

   Tip:

   Normally, capitalized phrases are proper names. Some lowercase
   phrases may be proper names as well; those which refer to specific
   people (e.g., "e. e. cummings") or entities (e.g., "priceline.com"),
   or those which are scientific names for specific organisms,
   compounds, etc. (e.g., "plexus brachialis"). But in most cases,
   lowercase phrases refer to classes of individuals and to concepts
   rather than to specific individuals.

   Example proper names (particular individuals):
               Mother Teresa
               IBM
               Acquired Immune Deficiency Syndrome

   Example common nouns or phrases (categories):
               cat
               thing
               Nissan car

4.3 Headword verification page

   Guessing the headword of the phrase

   Is "ad hoc military unit" a type of "unit"?
   Yes No

   Tip:

   The headword is the word that is being qualified by the other words in
   the phrase. It often indicates the general class of the concept
   described by the phrase. And, it determines the grammatic function of
   the phrase.

   For example, in the noun phrase "metal desk", the noun "desk" is the
   headword qualified by "metal". Similarly, the verb "sleeping" is the
   headword of the verb phrases "sleeping on the job" and "quietly sleeping".

4.4 Headword part-of-speech page

   Determining how the word 'unit' is used in "ad hoc military unit"

   Which of the following sounds best?
   (*) a unit; many units: countable noun like 'book'
   ( ) some unit; much unit: uncountable noun like 'sand'
   OK

   Checking variant phrases due to variations of headword 'unit'

   Are the following good ways to refer to 'ad hoc unit military'?
               ad hoc military units
   Yes No

   Tip:

   The main thing to check for is odd looking phrases, such as "loadings
   a weapon" as a variation of "loading a weapon". Note that there might
   not be variations. This may or may not be a problem, depending on the
   part of speech selected. For example, mass nouns like "sand" won't
   have variations.

4.5 Preferred reference check

   Checking status of preferred reference

   Is "ad hoc military unit" the most common way of referring to the
   concept 'ad hoc unit military'?
   Yes No

   Tip:

   Other ways of referring to AdHocUnit-Military are the following:
               ad hoc unit military



Footnotes

... Assistant1
The Dictionary Assistant is also known as the Lexification Wizard (lexwiz).