The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)
TNP Part VI Previous Post
VII. Associationism: Cognitive Structures
A. Objections to Associationism
Above, I have outlined what I mean by associationism and sketched some objections. At the risk of repetition, I would now like to describe these objections in greater detail. By considering these objections, I will be able to describe a theory of associationist inference in more detail. This description depends to some extent on some of the conclusions already established regarding representations and perceptions, and will be employed below in a discussion of language and logical inference.
The general form of objections to associationism is as follows: people have the ability to know or do X, associationism is not sufficiently powerful to explain how people know or do X, therefore, people employ some means of knowing or doing X other than associationism. For example, "We know that the external world exists. However, empiricism (which depends on associationism) cannot prove that the external world exists. Hence, we must have some non-empirical means of knowing that the external world exists."
As an example of this form of argument, consider the following from Leibniz's New Essays. "The senses, although sufficient for all our actual knowledge, are not sufficient to give it all to us, since the senses never give us anything but examples, that is, individual or particular truths. Now all the examples which confirm a general truth, whatever their number, do not suffice to establish the universal necessity of that same truth.... necessity truths... must have principles whose proof does not depend on examples, nor consequently on the testimony of the senses." 
As another example of the same sort of argument, consider Chomsky. He argues, correctly, that certain features of language use, for example, transformation, depend on knowledge of the structure of a given sentence in the language. Step-by-step inductive operations (that is, those which employ finite state devices) are inadequate to produce this knowledge. Therefore, we must have this knowledge independently of experience. It is innate, perhaps, or the product of evolution, and is not learned from experience. 
Bever, Fodor and Garrett also describe what they call a formal limit to associationism.  According to these authors, we are able to recognize that a certain string of characters is a well-formed formula (wff) in a language L (L) only with respect to a set of rules which contain abstract character. Since association is subject to what they call the "terminal meta-postulate", which asserts that associationist rules may be described only in those terms which describe behaviour, no associationist principle may contain an abstract character.  Therefore it follows that on the basis of associationist principles alone we cannot determine whether or not a given string of letters is a wff in L.
These arguments are all valid arguments. Thus, in order to refute them, it is necessary to show that either the first premise is false or the second premise is false. Which of these two options we employ will vary according to circumstances. In general I take the following route. Those arguments which assert that we have this or that knowledge are refuted by a denial of the first premise; I argue that we have no such knowledge. Those arguments which assert that we have a demonstrated capacity I refute by a denial of the second premise; I argue that associationism can produce such a capacity.
B. Scepticism and Knowledge Claims
let me consider only briefly instances of the first sort of refutation. Consider Leibniz's argument, stated above, that the "universal necessity" of some general truths must be known by some means other than the senses. One part of Leibniz's argument is certainly correct: we do not arrive at such knowledge from the senses. Further, it could be taken as arguable that we do not even know general principles, such as laws of nature, from the senses, nor can we even establish that one or another such principle is probably true. In my opinion, Popper's arguments on this point are conclusive. 
Contra Leibniz, I argue that we do not have any cognitive access to any such universal necessity, and therefore, do not in fact know that this or that principle is universal or necessary. Here is my argument.
Leibniz's own theory of necessity and possibility is very similar to that which we employ today: a proposition is necessarily true if and only if it is true in all possible worlds. Now either possible worlds are something which we create in our own minds or they are not. If they are, then while we may be certain that a given proposition is true or not true in all (conceived) possible worlds, since it may be the case that there may be possible worlds which we have not thought of yet (alternatively: since there are worlds which we cannot imagine), then our knowledge that a proposition is true in all (conceived) possible worlds is insufficient for us to know that it is universally or necessarily true. Thus, whatever we know about possible worlds in our own mind is distinct from the possible worlds in question. Hence, our knowledge about possible worlds might be incorrect. So even if a proposition is true in all (conceived) possible worlds, we cannot know it is true in all possible worlds. therefore, we cannot know that any proposition is universally or necessarily true.
It is of course true that there are some things which we can know, for example, I know that I exist. What I am arguing here is that experience, for example, my experience of myself, is sufficient to establish those things which I do know. Scepticism serves as a good rough-and-ready means of distinguishing what I know from what I don't. In general, those things which it is claimed that we know and which associationism cannot prove (that is, for which we cannot construct associative processes for knowing) are those things that can be undermined by a sceptical argument.
There is an alternative approach for those people who don't like scepticism. Suppose it is claimed that we know some proposition, say, that the ground will not disappear under my next step. Instead of asking how we know (for which there is probably no answer, but this is the sceptical move to be avoided) we ask how we know that we know. In such cases, typically, it is necessary to argue that we behave as though we know (direct introspection tends to be unconvincing in such cases and is the only alternative answer). But now it is not necessary to explain the knowledge; it is only necessary to explain the behaviour. Connectionism allows that a person can behave in this or that way without ever knowing the principle which underlies the behaviour. Thus, we can respond to an apparent knowledge claim by saying not only that we can't know, but further, that we don't need to know. (Human beings managed to stay attached to the Earth without difficulty for centuries prior to the discovery of gravity.)
C. Association and Cognitive Capacities
In general (exceptions noted), scepticism can refute any knowledge claim. Thus, the only means of establishing that associationism is inadequate to explain human cognition is to establish that we have some demonstrated capacity which, in principle, could not have been produced employing associative mechanisms.
The "in principle" part of the argument is the tough part to establish. Above, I have sketched a new theory, connectionism, which employs associationist principles. Although the exact limits of this new theory are difficult to define, nonetheless, first, we know that it is a very powerful theory, and second, we know exactly how it works. Hence, we are now in a position to describe in detail associationist mechanisms for producing previously unexplainable behaviour (unexplainable, that is, except with reference to some innate knowledge or capacity).
At th core of my objection to such as Fodor and Chomsky is a related theory which I have sketched above, specifically the theory which asserts that cognition does not necessarily proceed according to rules and clear and distinct categories. Therefore, it will not do to argue that associationism must produce a principled mechanism for performing this or that cognitive feat. All that is necessary is that some mechanism be described, even if we allow that particular instantiations may vary, perhaps considerably. (This latter should be expected for human capacities vary considerably.)
The theory I wish to propose in response to the Fodor-Chomsky argument has two parts. In the first part, during the course of experience, human beings detect repeated experiences of similar phenomena. From these, characteristic or prototype representations of those phenomena are constructed. Then, in the second part, these prototypes are employed to produce the cognitive behaviours various philosophers have argued cannot be created by association.
D. Essences and Accidents
It is to me a mystery why people argue that an abstract is something different from an experience. Let us examine how we developed a theory of abstractions in the first place. Its origin is Aristotelian, though it receives its clearest formulation in Medieval philosophy. In order to examine essences, let us consider, for example, the essence of something concrete, say, Socrates.
Medieval philosophers such as Ockham and Scotus agreed that Socrates was composed of two parts: his essence, and his accident. His essence is that attribute which Socrates must possess in order to be Socrates. His accident is that set of features which are not necessary particular to Socrates. We might say that the essence is that which continues, unchanging, to be Socrates, and his accident is that which may change from time to time without changing the fact that Socrates is Socrates. For example, Socrates is essentially human, but only accidentally snub-nosed.
So, for example, Ockham characterizes Scotus's view as follows: "a nature is this by something added that is formally distinct (from the nature)".  the 'something added' is called a "contracting difference", which "contracts it (the nature) to a "determinate individual". The word 'contract', or in Latin, 'contrahere', is, for example, to apply the genus to some species, of some species to some individual. For example, 'Socrates contracts the species of humanity'. 
The point I wish to emphasize here is that Socrates, the single individual, is composed of two parts: the essence and the accident. If we take away the accident, then we have the essence. For any given experience, it is no difficult matter to take away that part of the experience, particularly if that experience consists of, as I have suggested above, a set of activations of neural cells. If only some of those cells activate a further set of cells, the we have succeeded in taking away some of the experience. So we can, via a connectionist process, construct something which could be the essence of Socrates. We do so by deleting from the representation some or another features of Socrates, for example, his snub nose.
A key point: this essence just is what we mean by an abstract. The debate between Ockham and Scotus illustrates the contemporary debate concerning abstracts. According to Scotus, the essence of Socrates exists.  Socrates just happens to be a "contraction", or a particular instantiation, of that essence. Other human beings, for example, Aristotle, are different instantiations of that same essence. For after all, both Aristotle and Socrates are essentially human. Ockham's response to Scotus is well known in its outline. If Scotus is right, then we have two distinct types of entities: particular things, for example, Socrates, and essences, for example, humanness. However, as a methodological principle, it is better not to multiply entities beyond necessity. Since we do not have to postulate some independently existing essence, it follows that we should not.
Some philosophers, for example, Kripke, apparently still believe that there are independently existing essences.  Most philosophers do not. From my point of view, it does not matter whether essences have independent existence. The question is whether or not, by virtue of experience alone, we can detect them. I argue that we can, and I argue that the process just is as described above: we strip the accidental features from a given experience, and are left with a representation of the essence.
E. Evaluation of Essences
Where the real dispute lies, in my opinion, is whether there is one and only one set of permissible essences. For example, it is arguable that Socrates is essentially human. But it is also arguable that Socrates is essentially snub-nosed. There are several ways to pose this question. Must we identify one, rather than another, set of essences of things? Is some or another set of essences better? Or is the determination of essences ad hoc and random? In my opinion, some types of essences are better than others, but there is no [one] way that we must define the essences of things.
I believe that the essence of Socrates is the way that Socrates is similar to other things, and that the accident of Socrates is the way in which he is different. For example, Socrates is similar to Aristotle in that they are both human, yet they are different in that only Socrates is snub-nosed. The reason why humanness is a better essence than snub-nosedness is that snub-nosed and non-snub nosed people are otherwise very similar, while humans and non-humans tend to be quite different.
Another way of saying the same thing is as follows. Recall that a given representation, say, of Socrates, consists of a set of connections between a given unit and some set of units, and that this set of connections may be represented as a vector. See figure 14. These vectors may be more or less similar, for example, "1011" is more similar to "10010" and less similar to "00001".
Now suppose that we have the following set of vectors:
These vectors can be clustered according to similarity
It is by virtue of and only because of these clusterings that this or that identification of an essence is to be preferred.  In the former case, we may have the essence:
and in the latter:
The "x"s in this example indicate that there is no connection between a given unit in the vector and the unit which represents the essence. There are partial vectors; see figure 15.
We can produce a measure of the 'betterness' of a given essence by considering, first, the number of "x"s in a given vector, and second, the number of instances of the given essence. Suppose there are n instances of "111xxx" and there are m "x"s (in this case, m=3). Then, to use a simple example, the betterness b of "111xxx" is b=f(n,m) where f is a betterness function.
It is worth noting that this system of betterness is exactly what we would expect from a connectionist system. Take any unit "i" which is connected to a set of other units. The fewer the number of x's the greater the number of input units, hence, since input is summed, then (other things being equal) at any given time t, an essence with fewer x's will have greater activation than one with more x's. Second, if a given vector is activated frequently, then (other things being equal) a unit the activation of which depends on the activation of that vector will be activated more frequently. Since in connectionist systems, unit activation values tend to decay, then the more frequently a unit is activated, the higher its activation value will be. The function f takes into account the decay rate and the rest position toward which the unit tends to decay.
F. Abstractions, Categories, and Prototypes
What I wish to point out immediately is that an essence, defined above as a vector with some "x"s, just is an abstraction. The more "x"s a given essence has, the more abstract it will be. Abstractions, by virtue of the fact that they have many "x"s, tend at first glance to not have very much betterness; they hardly correspond to any input activation (ie., experience) at all. However, since they are so frequently activated, this initial weakness is overcome.
The definition of a category can proceed with reference to the essence or the abstract feature of the members of that category. A category just is the set of those instantiations which result in the activation of, say, vector "111xxx". This is a normal and standard type of definition of categorization: the necessary and sufficient conditions for membership in any given category will be the set of activations which correspond to "111xxx". But the story does not end there.
Suppose we have a given category, the essence of which is activated by "111xxx". However, since partial vectors can result in the activation of a given unit, the unit will be activated by "110xxx". In this case, the activation will be only two thirds as strong as in a normal case. But since this is possible, no one of the units will be a necessary condition for the activation of a given essence-unit. If the clustering is such that there is no other place to put an instance of "110xxx", then we will typically assign whatever corresponds to "110xxx" to the category defined by "111xxx". Note that we have not defined a new category "11xxxx", since the third spot on the vector remains connected. Rather, we have extended what counts as an instance of "111xxx". See figure 17.
To change the example so slightly now in order to make the next point, suppose we have a category defined by "11111x". Any and all of the following will stimulate activation of that essence:
and so on. It is clear from this example that some sets of activation are better than others, that is, they result in a greater activation of the essence-unit. In this case, the activation of
will create the strongest activation. Whatever it is which corresponds with this vector constitutes a "prototype" of the category defined by "11111x". 
Human beings actually do this. Consider, for example, the category "bird". Birds are grouped into a given category because they have some features in common, for example, they are cold-blooded, lay eggs, have wings, beaks and claws, fly, and the like. Some birds, such as robins, have all of those features. A robin is therefore a prototypical bird. Others, for example, penguins, have most but not all of these features (they don't fly). While they are still birds, we do not consider penguins to b prototypical birds.
Think about this. Imagine a "dog". Now - did you imagine a collie or German shepherd, or did you imagine a Mexican hairless?
G. Are There Real Essences?
The one objection I can think of to this sort of story is that there are "real" essences which, first, do not correspond to any given experience, and which, second, we must employ in order to construct our system of categorizations. This objection is first raised by Descartes and has its modern instantiation in Kripke.
In my opinion, whether or not there are real essences does not matter. Suppose they exist. Either we detect them or we do not. If we do not, then we have no means of employing them in order to construct categories. Therefore, if they are of any importance at all, then we must detect them. Suppose we detect them. Then we either detect them as thy are, or we do not. If we detect them as they are, then whatever they are (according to connectionist theory) will be reflected in our actual system of categorizations. If we do not detect them as they are, then the way they are does not affect our categorization. Therefore, the only case in which real essences can affect our system of categorization is a case in which, first, they exist, and second, we detect them as they are.
Suppose they exist and we detect them as they are. Either we detect them through the sense or we do not. Suppose we believe, like Descartes, that we do not detect them though the senses. Then they must be, as Descartes suggests, innate. If they are innate, however, then there could be no disagreement regarding the best system of categorization (recall that we are detecting them as they are). However, there is such a disagreement, for I disagree. Therefore, they cannot be innate. Thus, we must detect them by experience.
If they are detected by experience, however, since what we experience is distinct from that which is experienced, then even if we detect them as they are, we cannot ever know that we detect them as they are. Therefore, whether or not we detect them as they are is irrelevant, for all we can work with is the experience. This is exactly what I am proposing.
Finally, let me propose the following challenge to those people who propose that there are real essences and that we detect those essences via some non-empirical mechanism. Since according to the theory I have proposed I have an exact and clearly detailed mechanism for identifying and evaluating different schemes of categorization, then let me challenge those who propose an alternative mechanism to detail exactly how these categorizations are detected and how disputes concerning the relative merits of different systems of categorizations are to be evaluated. There is only one condition to thi challenge: the system cannot refer to experience in order to detect and evaluat systems of categorization. I propose that it cannot be done.
TNP Part VIII Next Post
 G.W. Leibniz, New Essays Concerning Human Understanding, pp. 42-44.
 Noam Chomsky, Syntactic Structures. Cited in P. Johnson-Laird, The Computer and the Mind, pp. 306-314.
 T.G. Bever, J.A. Fodor, M. Garrett, "A Formal Limit of Associationism", from Verbal Behaviour and General Behaviour Theory, T.R. Dixon and D.L. Horton, editors. Prentice-Hall, 1968.
 J.R. Anderson and G.H. Bower, Human Associative Memory; A Brief Edition, p. 15.
 Karl Popper, The Logic of Scientific Discovery and Postscripts, pp. 363-366.
 William of Ockham, Ordinatio, from martin Tweedale (ed. trans.) "Selections from William of Ockham's Ordinatio Concenring Universals." Mss.
 Richard McKeon, Selections from Medieval Philosophers, Vol. 2, p. 441.
 though not independently, that is, not in the absence of a contraction. John Duns Scotus, Opera Omnia, Vol. XVI, sec/ 275. See also Martin Tweedale, "Dpoes Scotus' Doctrine on Universals Make any Sense?", p. 104.
 Sail Kripke, Naming and Necessity, p. 127. After asserting that gold is essentially element 79, he writes that "According to the view I advocate, then, terms for natural kinds are much closer to proper names than is ordinarily supposed."
 See Jeff Elman, "Representation in Connectionist Models", Connectionism conference, for an account of how clustering occurs according to word functionality. See also George Lakoff, Women, Fire and Dangerous Things, ch. 2.
 This is a simplified version of the theory proposed by Anderson and Mozer in "Categorization and Selective neurons: in Anderson and Hinton, Parallel Models of Associative Memory, pp. 213-236.