Bruce Schuman
PO Box 23346
Santa Barbara, CA 93121
(805) 966-9515
Synthetic Dimensionality
Synthetic Dimensionality | Forum on Conceptual Structure
Design for a Transcendental Bridge
Linkage of the universal and the particular

Introduction to the Theory of Concepts
General principles of conceptual structure

The Dimensionality of Concept Structure
Similarities and differences in category formation

The Universal Hierarchy of Abstraction
Framework for a mathematical epistemology

Synthetic Dimsionality
The recursive algebra of semantic space

Bibliography
Foundations of this model

Resume
Background and history

INTRODUCTION TO THE THEORY OF CONCEPTS
General principles of conceptual structure

Bruce Schuman
May, 1994

  1. Concepts are Ad Hoc
  2. Concepts are Discrete; Reality is Continuous
  3. Concept, Symbol, and Referent
  4. Primitive Concepts
  5. Fundamental Conceptual Types
  6. Similarities and Differences
  7. The Aristotelian Type Hierarchy
  8. Summary
A "concept" is defined by Webster's Dictionary (New 20th C. Unabridged, p376) as "an idea, especially a generalized idea of a class of objects", and is derived from the Latin conceptus, "a collecting, gathering". And the American Heritage Dictionary defines a concept as "a general idea or understanding, especially one derived from specific instances or occurrences." (New College Edition, p275).

The following excerpts on the fundamentals of concept structure are taken from the 1984 book Conceptual Structures, Information Processing in Mind and Machine, from the Addison-Wesley System Programming Series, by IBM Systems Research Institute senior staff member John Sowa.


1. Concepts are Ad Hoc

Sowa, p. 344:

Concepts are inventions of the human mind used to construct a model of the world. They package reality into discrete units for further processing, they support powerful mechanisms for doing logic, and they are indispensable for precise, extended chains of reasoning. But concepts and percepts cannot form a perfect model of the world, -- they are abstractions that select features that are important for one purpose, but they ignore details and complexities that may be just as important for some other purpose. Leech (1974) noted that "bony structured" concepts form an imperfect match to a fuzzy world. People make black and white distinctions when the world consists of a continuum of shadings.

For many aspects of the world, a discrete set of concepts is adequate: plants and animals are grouped into species that usually do not interbreed; most substances can quickly be classified as solid, liquid, or gas; the dividing line between a person's body and the rest of the world is fairly sharp. Yet such distinctions break down when pushed to extremes. Many species do interbreed, and the distinctions between variety, subspecies, and species are often arbitrary. Tar, glass, quicksand, and substances under high heat or pressure violate common distinctions between the states of matter. Even the border between the body and the rest of the world is not clear: Are non-living appendages such as hair and fingernails part of the body? Is so, what is the status of fingernail polish, hair dye, and makeup? What about fillings in the teeth or metal reinforcements embedded in a bone? Even the borderline between life and death is vague, to the embarrassment of doctors, lawyers, politicians, and clergymen.

These examples show that concepts are ad hoc: they are defined for specific purposes; they may be generalized beyond their original purposes, but they soon come into conflict with other concepts defined for other purposes. This point is not merely a philosophical puzzle; it is a major problem in designing data-bases and natural language processors. Section 6.3, for example, cited the case of an oil company that could not merge its geological database with its accounting database because the two systems used different definitions of oil well. A database system for keeping track of computer production would have a similar problem: the distinctions between minicomputer and mainframe, between microcomputer and minicomputer, between computer and pocket calculator, are all vague. Attempts to draw a firm boundary have become obsolete as big machines become more compact and small machines adopt features from big ones.

If an oil company can't give a precise definition of an oil well, a computer firm can't define computer, and doctors can't define death, can anything be defined precisely? The answer is that the only things which can be represented accurately in concepts are man-made structures that once originated as concepts in some person's mind. The rules of chess, for example, are unambiguous and can be programmed on a digital computer. But a chess piece carved out of wood cannot be described completely because it is partly the product of discrete concepts in the mind of the carver and partly the result of continuous processes is growing the wood and applying the chisel to it. The crucial problem is that the world is a continuum and concepts are discrete. For any specific purpose, a discrete model can form a workable approximation to a continuum, but it is always an approximation that must leave out features that may be essential for other purposes.

Since the world is a continuum and concepts are discrete, a network of concepts can never be a perfect model of the world. At best, it can only be a workable approximation.


2. Concepts are Discrete; Reality is Continuous

Sowa, p345:

By drawing distinctions and giving names to the things distinguished, language separates figure from ground. Consider a tree. It has no sharp boundaries between parts; yet words divide the tree into trunk, roots, branches, bark, twigs, leaves, buds, knots, flowers, seeds, fruit, and even finer subparts such as veins in the leaves and pistils in the flowers. Even the boundary between the tree and the environment may be indistinct: the tree may have started as a sprout from the root of another tree and may still share a root system with its parents and siblings; insects and animals may be living in the tree; a vine may be climbing up the trunk, moss may be on the bark, fungus may be growing on a dead branch, and bacteria in root nodules may be supplying nutrients. The arbitrary way that words cut up the world was emphasized by the linguist Benjamin Lee Whorf (1956):
"We dissect nature along lines laid down by our native languages. The categories and types that we isolate from the world of phenomena we do not find there because they stare every observer in the face; on the contrary, the world is presented in a kaleidoscopic flux of impressions which has to be organized by our minds, -- and this means largely by the linguistic systems in our minds. We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are parties to an agreement to organize it in this way, -- an agreement that holds throughout our speech community and is codified in the patterns of our language.
The division of the world into distinct things is a result of language. The philosopher Searle (1978) elaborated on that point:
I am not saying that language creates reality. Far from it. Rather, I am saying that what counts as reality, -- what counts as a glass of water or a book or a table, what counts as the same glass or a different book or two tables -- is a matter of the categories that we impose on the world; and those categories are for the most part linguistic. And furthermore, when we experience the world, we experience it through linguistic categories that help to shape the experiences themselves. The world doesn't come to us already sliced up into objects and experiences; what counts as an object is already a function of our system of representation, and how we perceive the world in our experiences is influenced by that system of representation. The mistake is to suppose that the application of language to the world consists of attaching labels to objects that are, so to speak, self-identifying. On my view, the world divides the way we divide it, and our main way of dividing things up is in language. Our concept of reality is a matter of our linguistic categories."
Sowa, p. 39:
Defining a concept as a unit presupposes that concepts are discrete. This assumption is supported by the fact that discrete relationships are remembered more accurately than continuous quantities. When people are asked to describe or draw a scene from memory, what they remember are discrete properties: The tree is to the left of the car, The dot is above the circle, or There are three red houses and a yellow one. Sizes, times, and temperatures are remembered with discrete comparisons: The corn is knee-high, I waited until the parking lot emptied out, or, The water is scalding hot. All human languages name only a discrete set of colors out of the continuous spectrum. Most people can remember the discrete steps of a melody; but perfect pitch, the ability to remember an exact frequency, is rare even among musicians.

Even if people cannot remember continuous quantities, they can still detect them. They cannot, however, encode them in long-term memory. When comparing two objects directly, people readily notice small differences in color, weight, temperature, and size; but they cannot remember those quantities for more than a few seconds. Temperature, emotional state, and distance are continuous; but languages represent them by discrete words like cold, cool, tepid, lukewarm, warm, hot; happy, sad; far, near. Instruments like clocks and thermometers aid the memory by converting a continuous time or temperature into a string of discrete digits that can be remembered indefinitely.

To adapt the discrete words to a continuous world, natural languages have "fuzzy" words like somewhat, very, almost, rather, more or less, approximately, just, about, and not quite. Such words cannot provide a continuous range of variability; very hot is just one more discrete state beyond hot, and very very hot is one more beyond that. Zadeh (1974) developed a theory of fuzzy logic to assign precise values to such terms, but his calculus of fuzzy values makes distinctions that no natural language ever represents. People use hedges like more or less warm when their standard for warm is not quite attained, but the world has a continuous range of temperatures that discrete words can never describe. The reason that language has fuzzy terms is not that human thought is fuzzy, but that the world is fuzzy.

3. Concept, Symbol, and Referent

Sowa, p 10:

The "intension" of a word is that part which follows from general principles. The "extension" of a word is the set of all existing things to which the word applies. The intension of mammal, for example, is a definition, such as "warm-blooded animal, vertebrate, having hair and secreting milk for nourishing its young"; the extension is the set of all animals in the world. Extensions are usually unwieldy sets that cannot be observed in their entirety and cannot serve as practical definitions. But a zoologist can identify a new type of mammal from the intensional definition, even though the species may not be listed in any catalog of mammals.

Perception maps extensional objects to intensional concepts, and speech maps concepts to words. But the relationship between word and object is an indirect mapping, deriving from the two direct mappings of perception and speech. Aristotle first made that observation:

Spoken words are symbols of experience in the psyche; written words are symbols of the spoken. As writing, so is speech not the same for all peoples. But the experiences themselves, of which these words are primarily signs, are the same for everyone, and so are the objects of which those experiences are likenesses. (On Interpretation 16a4)

         "The Meaning Triangle":  SYMBOL symbolizes CONCEPT refers to
                     REFERENT; SYMBOL stands for REFERENT

                          "The Meaning Triangle"

                                 CONCEPT
                                   /\
                                  /  \
                                 /    \
                                /      \
                    symbolizes /        \ refers to
                              /          \
                             /            \
                            /              \
                           /________________\
                       SYMBOL stands for REFERENT

Aristotle's distinction has been recognized and restated many times throughout the history of philosophy. Ogden and Richards (1923) codified it as the "meaning triangle". The left corner is the symbol or word; the peak is the concept, intension, thought, idea, or sense; and the right corner is the referent, object, or extension. For some concepts, one corner of the triangle may be absent: a person may have a concept of an object for which he knows no word, or he may have a word for a concept that has no extension. The word unicorn is mapped to the concept [UNICORN] in the same way that horse is mapped to [HORSE], even though there are millions of horses in the world, but no unicorns.


4. Primitive Concepts

Sowa, p. 13:

The intension of a complex concept may be defined in terms of more primitive concepts. Aristotle defined the concept type MAN in terms of RATIONAL and ANIMAL. The type ANIMAL is the genus or general type, and RATIONAL is the differentia that distinguishes MAN from other types of ANIMAL. The concept types RATIONAL and ANIMAL could themselves be defined in terms of still more primitive genera with appropriate differentia until, perhaps, everything would be defined in terms of indivisible primitives. Aristotle's primitives, which he called categories, include Substance, Quantity, Quality, Relation, Time, Position, State, Activity, and Passivity. These are ultimate primitives to which all other concepts are supposed to be reducible.

The AI goal of mechanically reducing concepts to primitives was first proposed by Ramon Lull in the thirteenth century. His Ars Magna was a system of disks inscribed with primitive concepts, which could be combined in various ways by rotating the disks. Under the influence of Lull's system, Leibniz (1679) developed his Universal Characteristic. He represented primitive concepts by prime numbers and compound concepts by products of primes. Then statements of the form All A is B are verified by checking whether the number for A is divisible by the number for B. If PLANT is represented by 17, and DECIDUOUS by 29, their product 493 would represent DECIDUOUS-PLANT. If BROAD-LEAFED-PLANT is represented by 20,213 and VINE by 1, 192,567, the statement All vines are broad-leafed plants is judged to be true because 1,192,567 is divisible by 20,213. Leibniz envisioned a universal dictionary for mapping concepts to numbers and a calculus of reasoning that would automate the syllogism.

With the advent of electronic computers, computational linguists set out to implement Leibniz's universal dictionary. Masterman's semantic nets (1961) were based on 100 primitives, such as FOLK, STUFF, CHANGE, GO, TALK. Masterman and her colleagues created a dictionary of 15,000 words defined in terms of the 100 primitives. For conceptual dependency graphs, Shrank (1975) reduced the number of primitive acts to 11. The phrase x bought y, for example, could be represented as x obtained possession of y in exchange for money.

Transforming high-level concepts into primitives can show that two different phrases are synonymous. But many deductions are shorter and simpler in terms of a single concept like LIAR than a graph for one who mentally transfers information that is not true. In general, a system should allow high-level concepts to be expanded in terms of lower-level ones, but such expansions should be optional, not obligatory. In recent versions, Shrank and his colleagues have relied on high-level conceptual types, like AUTHORIZE and KISS, instead of expanding everything into primitives.

Definitions in terms of primitives ultimately derive from Aristotle's mode of definition by genus and differentia. Yet Aristotle himself listed different categories in different writings and never gave a final, definitive set of primitives. Modern dictionaries analyze thousands of words into more primitive ones, but they are not limited to a fixed set of categories. They also allow circular definitions: word A is defined in terms of B, which is directly or indirectly defined in terms of A.

In his early philosophy, Wittgenstein (1921) presented an extreme statement of the classical Aristotelian view: compound propositions are made up of elementary propositions, which in turn are related to atomic facts about elementary objects in the world. Yet Wittgenstein never found a single example of a truly unanalyzable atomic fact or an elementary object that had no components. A chair, for example, is a single object to somebody who wants to sit down; but for a cabinet maker, it has many parts that must be carefully fit together. For a chemist developing a new paint or glue, even the wood is a complex mixture of chemical compounds, and these compounds are made up of atoms, which are not really atomic after all.


5. Fundamental Conceptual Types

Sowa, p. 16:

For most of the concepts of everyday life, meaning is determined not by definition, but by family resemblance or a characteristic prototype. In a study of concepts, Smith and Medin (1981) summarized three views on definitions:

1. Classical. A concept is defined by a genus or supertype and a set of necessary and sufficient conditions that differentiate it from other species of the same genus. This approach was first stated by Aristotle and is still used in formal treatments of mathematics and logic. It is the approach that Wittgenstein presented most vigorously in his early philosophy, but rejected in his later writings.

2. Probabilistic. A concept is defined by a collection of features and everything that has a preponderance of those features is an instance of that concept. This is the position taken by J. S. Mill. It is also the basis for the modern techniques of cluster analysis.

3. Prototype. A concept is defined by an example or prototype. An object is an instance of a concept c if it resembles the characteristic prototype of c more closely than the prototypes of concepts other than c. This is the position taken by Whewell and is closely related to Wittgenstein's notion of family resemblances.

In fuzzy set theory, Zadeh (1974) tried to formalize the probabilistic point of view. His related theory of fuzzy logic extends uncertainty to every step of reasoning. In prototype theory, however, judgments are made in a state of uncertainty, but once a plant is classified as a member of the rose family, further reasoning is done with discrete logic. Fuzzy set theory has important applications to pattern recognition, but fuzzy logic is problematical.

Although classical definitions are not possible for all concepts, some concepts are more general than others. All games are activities even if one cannot say exactly what differentiates them from other activities. Yet children learn concrete types like DOLL or HOPSCOTCH long before they learn general ones like ENTITY or ACTIVITY. The statement All dogs are animals remains true whether or not a person fills in the type hierarchy with mammals and carnivores.

A realistic theory must support a type hierarchy, but it must not require that every concept be reduced to primitives. [Note: Synthetic Dimensionality defines "primitive" at a more fundamental level than Sowa is discussing here] This book [Sowa's] supports a compromise between Aristotle and Wittgenstein: Section 3.6 introduces definitions by genus and differentiae, and Section 4.1 allows open ended families of schemata and prototypes that can grow and change with experience.

Some systems are not dogmatic about which concepts are primitive, but they have no mechanisms for dynamically defining new types in terms of more primitive ones. Type definitions provide a way of expanding a concept in primitives or contracting a concept from a graph of primitives. Definitions can specify a type in two different ways: by stating necessary and sufficient conditions for the type, or by giving a few examples and saying that everything similar belongs to the type. The first method derives from Aristotle's method of definition by genus and differentiae, and the second is closer to Wittgenstein. AI systems have supported both methods.

Definitions by genus and differentiae are logically easiest to handle.

Definitions by example or prototype are essential for dealing with natural language and its applications to the real world, but their logical status is unclear.

[In Sowa's theory of conceptual graphs] New type labels are defined by an Aristotelian approach. Some type of concept is named as the genus, and a canonical graph, called the differentia, distinguishes the new type from the genus. The differentia is the body of a monadic expression, and the genus is the type label of the formal parameter.

"As an example of type definition, Fig. X defines KISS with genus TOUCH and with a differentia graph that says that the touching is done by a person's lips in a tender manner."


6. Similarities and Differences

From Science, Order, and Creativity, by David Bohm and F. David Peat, Bantam Books, 1987, p.112:

Some reflection will show that our first notions of order depend on our ability to perceive similarities and differences. Indeed, there is much evidence which shows that our vision, as well as other senses, works by selecting similarities and differences. This suggests that perception begins through the gathering of differences as the primary data of vision, which are then used to build up similarities. The order of vision proceeds through the perception of differences and the creation of similarities of these differences.

In thought a similar process takes place, beginning first with the formation of categories. This categorization involves two actions: selection and collection. According to the common Latin root of these two words, select means "to gather apart" and collect means "to gather together". Hence categories are formed as certain things are selected, through the mental perception of their differences from some general background. The second phase of categorization is that some of the things that have been selected (by virtue of their difference from the background) are collected together by regarding their differences as unimportant while, of course, still regarding their common difference from the background as important.

In the process of observing a flock of birds in a tree, the category of birds is formed by putting things together that are simultaneously distinguished from those that do not belong to this category, -- for example, from squirrels. In this way, sets of categories are formed, and these, in turn, influence the ways in which things are selected and collected. Selection and collection therefore become the two, inseparable sides of the one process of categorization.

The determination of similarities and differences can go on indefinitely. As some differences assume greater importance and others are ignored, as some similarities are singled out and others neglected, the set of categories changes. Indeed, the process of categorization is a dynamical activity that is capable of changing in a host of ways as new orders of similarity and difference are selected.


7. The Aristotelian Type Hierarchy

Sowa, p. 81

Aristotle first introduced type hierarchies with his theory of categories and syllogisms. He had ten primitive types, a method for defining new types by genus and differentia, and the use of syllogisms for analyzing the inheritance of properties. In Artificial Intelligence, the type hierarchy supports the inheritance of properties from supertypes to subtypes of concepts.

The best way to study type hierarchies is to analyze the structure of dictionary definitions, preferably by computer. Amsler (1980) found a rich hierarchy in his analysis of the Merrian-Webster Pocket Dictionary. The hierarchy tended to be bushy, with each node having many descendants, but it did not grow very deep. The concept type VEHICLE, for example, had 165 subtypes, but the hierarchy extended for only three levels. At the first level, the immediate subtypes of VEHICLE included AMBULANCE, AUTOMOBILE, BICYCLE, BUCKBOARD, BUS, CARRIAGE, CART, etc. The next level beneath AUTOMOBILE included the subtypes COACH, CONVERTIBLE, COUPE, HOT-ROD, JALOPY, SEDAN, etc. The next level beneath SEDAN included BROUGHAM, LIMOUSINE, and SALOON.

Actions, states, and properties can also be grouped in hierarchies. In analyzing verbs, Chodorow (1981) also found bushy, but shallow hierarchies. The concept type COMPLAIN, for example, has subtypes BELLYACHE, BITCH, CRAB, GRIPE, INVEIGH, SQUAWK, and WHIMPER, none of which have any further subtypes.

Sowa, p. 128
Type definitions are based on Aristotle's method of genus and differentia. They support decompositional semantics where a high-level conceptual type is decomposed into a graph of primitive types.

Aristotle distinguished genera and differentiae. Roughly speaking, a genus is a class which has a characteristic which is common to all members of that class, and a differentia is a characteristic which belongs to members of one subgroup a but not to members of other subgroups b, c, etc. (Keith Hope, Methods of Multivariate Analysis, University of London Press, 1968, p.23)

Note the inverse relationship between the number of features for a concept type and the number of entities to which it applies. The type DOG applies to fewer entities in the real world than its supertype ANIMAL, but more features are required to describe it. The inverse relationship between the number of properties required to define a concept and the number of entities to which it applies was first noted by Aristotle. It is called the "duality of intension and extension". (Sowa, p.384)


8. Summary

  • The exact meaning of a qualitative concept is ad hoc, or context specific, and is specified to some desired degree of accuracy by the person using the concept.

  • Thus, word and concept meaning is placed in service to human intention. Meaning is essentially stipulative: we choose words from a loosely defined vocabulary grounded in social contract, and assign to these approximately defined words any exact meaning we choose, by precisely dimensioning these meanings with our own choice of boundary values. Words and concepts mean what we want them to mean, and they serve our purposes. If in an act of communication, word meaning is not initially clear or adequately exact, we provide additional meaning, until the specification of our intention falls within acceptable error tolerances.

  • The meaning of an abstract qualitative concept is grounded in quantitative dimensions, through an implicit cascade of definitions.

  • Any concept is a discrete digital structure defined in terms of lower and upper boundary values which function as error tolerances. Continuous reality may vary to some detectable degree, and yet still fall within the concept. The word "hot" might have boundary values of 75 to 90 degrees Fahrenheit; "very hot" might be used to describe 85 to 110.