The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)
TNP Part VIII Previous Post
IX. Connectionism and Justification
A. When Some Connctions Are Better Than Others
An objection exactly analogous to the objection to operationalism may be brought against connectionism in general. In connectionist systems, anything may be connected with anything else. However, it is clear that there must be some subset of the set of all possible connections such that the connections in this subset are better than the other connections. For example, among the types of connection which are possible, there is a subset of connections which corresponds to logical inference.  We want to distinguish these logical connections from those connections which are (for lack o a better term) merely accidental. However, there is no means, from within a strictly connectionist framework, of establishing this distinction. Therefore, connections must be evaluated according to constraints over and above any given connectionist system.
One weakness of the objection just stated is that there is no clea agreement regarding what constitutes the proper constraints for such an evaluation. Suppose, for example, we are attempting to parse a sentence in order to determine its meaning. According to some philosophers, for example, Fodor, this task may be accomplished with reference to grammar, that is, rules and structure. Others, for example Winograd, argue that semantical consideraions need sometims to be taken into account. It is also reasonable to argue that the meaning of a sentence can only be determined with respect to pragmatic, or context-dependent, constraints.
Similarily, in the philosophy of science, there is no clear agreement regarding what constitutes a good scientific theory. Some philosophers, for example van Fraassen, argue that theories ought to be evaluated according to their empirical adequac. Others, such as Hooker, argue that "epistemic virtues" such as simplicity and coherence are what guides the evaluation of a theory. According to many philosophers, most prominent among them being Popper, a scientific theory ought to be testable, bt this does not stop some theorists, for example van Daniken, from porposing untestable theories. And finally, some philosophers follow Feyerabend and assert that there are no standards of goodness for scientific theories.
These examples may appear to be out of place on the ground that, in the formal disciplines, there are clear standards for the evaluation of operations. In logic, we have the constraint of truth-preservation, specifically, an inference is valid if and only if it preserves truth, and is invalid otherwise. In mathematical equations, similarily, an operation is correct if and only if it preserves equivalence, and incorrec otherwise. Therefore,if a connectionist system cannot distinguish between, say, truth-preserving and non-truth preserving operations, then the system must be guided by some set of constraints over and above itself, that is to sa specifically, it must be guided by innate constraints. There are several examples in the literatire of this sort of consideration. Fodor  criticizes the "picture" theory of representation on similar grounds, and Holland (et.al.)  build such constraints into their system of inductive inference.
The idea here is that in any representation, there will be representational content. Representational content may be more or less representative of what it represents. For example, if the representation is propositional in form, then the proposition will be either true or false according to whether whatever is asserted by the proposition is in fact the case. the criticism, therefore, of connectonist systems is that there is no means of evalating connections such that it can be determined that their representational content corresponds, or does not correspond, to whatever happens to be the case.
If I were to use the general response to ojections outlined above, then if this were an item of knowledge, I would deny it, and if it were a skill or capacity, I would explain how it can be accomplished using an associationist (connectionist) mechanism. owever, ruth does not appea to fall under wither category, and hence, needs a special discussion of its own.
Let me examine the concept of truth more closely. The standard, naive definition of truth is correspondence with reality, for example, a proposition P is true if and only if P. This definiion of truth is inadequate because there are many propositions which are true, for example, predictions and other subjunctive conditionals, or statements about possibility, to which by definition nothing in the world corresponds. A better definition of truth is provided by Tarski: P is true if and only if it corresponds with a model of the world.
But this is a different definition of truth than the definition of truth which is considered to apply in formal inferences, for in this case, we are talking about truth-preservation and not truth per se. A logical inference is valid strictly according to its form; the world is not a factor to be taken into consideration. Thus, the claim that logical inferences are truth-preserving by itself has nothing to do with the nature of the world or models of the world. An additional link - between truth-preservation and correspondance - must be established indepe3ndently. For, without such a link, truth-preservation by itself is no virtue. It must be shown that truth-preservation is a good means of constructing inference about the world or about models of the world.
For a certain set of inerences, we can concede that this is the case. Take, for example, an inference about points on a journey. If x arrived at A before B and x arrived at B before C, then the rules of truth-preservation tell us that x arrived at A before C. This inference is confirmed by observation. It is however by no means clear that the rules of truth=preservation always apply when we are talking about the world. First, there is no reason to believe that these laws actually apply to the real world or even t models of the real world (unless the models are governed by an a priori stupilation that they must adhere to such rules, in which case holding up the model as an example is a fancy way of begging the question).  And second, it is clear that we want to make many [other] inferences aout the world or models of the world, for example, inductive inferences, for which the rule of truth preservation [is] of little or no use. Therefore, in at least some cases, something other than the rule of truth preseration must be employed in order to evaluate our inference.
This is an important criticism of the objection to connectionism and associationism. In response to the objection that connectionist systems cannot provide an evaluation of this or that representation, the response is that traditional systems fare no better, or at least, are only a very slight improvement.
C. Relevant Similarity
Opposed to the concept of truth as our standard of evaluation, I wish to propose the standard of "relevant similarity". This standard has a number o advantages. First, it works, in the sense that successful inferences can be distinguished from unsuccessful inferences using relevant similarity. Second, in order to employ relevant similarity, no innate or a priori constraints are required. We know this because systems which naturally employ relevant similarity, connectionist and associationist systems, require no innate or a priori constraints. And third, the standard of relevant similarity is exremely powerful. For example, inferences ma be evaluated drecly according to relevant similarity, for example, the sample of a generalization must be relevantly similar to the whole. Or at another level, an inference may be evaluated according to whether or not its form (that is, some abstraction of the inference) is relevantly similar to previously successful inferences. Let me sketch these in a bit more detail.
Consider the typical industive inference. The premises consist of a set of instances of some phenomenon or state of affairs, for example, "A1 is a B", "A2 is a B", etc. The conclusion is either a generalization of these observations, for example, "All A are B:, or a prediction about the next instance, for example, "An+1 is a B". Standard textbooks  list two major fallacies which can occur in such inferences: hasty generalization, in which too few instances are observed, and unrepresentative sample, in which observations are biased in some way. Both of these fallacies can be explained with reference to relevant similarity. An industive argument works because the premises and the conclusion all describe similar phenomena, so, if the phenomena described are not sufficiently simila, the inference fails. An inrepresentative sample is significantly different from the [sample described in the] conclusion, thus, the inference fails. In a hast generalization, we have not seen enough samples to be sure we have established similarity, hence, the inference fails.
Connectionist systems using relevant similarity for the evaluation of industive inferences avoid many of the problems which plague standard work in induction.  For example, one may ask why we use one particular set of premises, and not so0me other set of premises. The answer is naturally provided by the clustering mechanism described above. Another problem is the question of how many instances are required before we are able to say we have sufficient grounds to draw a conclusion. This answer is given by the activation value of the abstraction from a given cluster. If that abstraction has a sufficiently high activation value compared to other evaluations, then the inference works. Otherwise, it does not. There is no clear-cut numerical answer to these questions: it will always be relative to the structure of the net as a whole. What connectionism provides, and what traditional theories do not provide, is a mechanism for determining the answer in particular cases, rather than a mechanism which determines one answer for all cases.
I have already mentioned a few cases of the second sort of evaluation, that is, an evaluation which asserts that an inference is successul if its form is sufficiently to some previously correct or successful inference. So, for example, a person learns modus ponens by being shown examples similar to "If I am in Edmonton then I am in Alberta..." and learns not to deny the antecedent in the same way.
As I mentioned above, a connectionist system will attempt to employ relevant similarity on its own. It does this because such a system tends to adjust connection weights and unit activation untl it reaches a stable or "rest" position. The exact ature o this rest position depends to some degree on how the system is constructed: change the leaning rule and you change the rest position. However, in all cases, the settled state will be one in which all and only those units who's vectors are similar to the input activation will themselves be activated (or as nealy so as possible ). I have illustrated how we might develop a rule of transitivity which is useful on journeys from place to place. For example, Lakoff suggests that we develop the concept of cause by analogy with human actions.
TNP Part X Next Post
 These are described in Rumelhart and McClelland, Parallel Distributed Processing.
 In Ned Block, Imagery.
 In Holland, Holayk, Nesbitt and Thagard, Induction: Processes of Inference, Learning and Discovery.
 Here I am assuming a correspondance definition of truth. Other definitions are available, see, for example, Rescher.
 This is very similar to the point made about scientific theories, above, for if a scientific theory is a model of the world, then, as noted, there are innumerable possible ways of building such models.
 For example, Jerry Cedarblom and David Paulsen.
 See henry Kyburg, "Recent work in the Problem of Induction."
 See Rumelhart and Mac Clelland on satisfying multiple simultaneous constraints in Explorations in Parallel Distributed processing, ch. 3.
 Philip Kitcher, The Nature of Mathematical Knowledge.