States, Parts & Information:

Parts are partially ordered -- one part (component, sub-system) is contained in another if the states of the latter determine the states of the former. I is the biggest part containing all others; O is the "empty part", contained in all others.

One part contains information about another part, if it contains that part in the sense just defined. Parts (partitions) are information in this sense; each part of the world contains or stores a certain aspect of the world defined by its (the part's) states. If A contains B (as a part), then the information which A stores is sufficient to determine what B stores, so A contains information about B.

These definitions have been framed in an absolute sense for simplicity, but we are usually interested in information in a relative or conditional sense. Imagine two computer memory locations and suppose one (A) contains information about the other (B). B is not part of A in an absolute sense; hence the states of B are not completely determined by A -- how then can Astore information about the state of B? Only relative to a subset of world states (you might think of the subset as kind of generalized 'time interval' or 'epoch'). For when we consider particular subsets of world states, new orderings between parts arise which hold only on those subsets. In particular, A contains information about B on those subsets in which A's states and B's states are linked by a function (say, h) so that for all world states s in the subset, if A's state is given by A(s), then B's state is given by h(A(s)). For that epoch, in the proposed ontology, B is literally contained in A. If you think "that couldn't possibly be -- after all, the two memory locations are physically disjoint!" -- that just shows that you are conceptualizing the locations relative to a larger epoch. And indeed the theory agrees with you; it says that the occurrences or events or states at one physical location cannot (logically) determine completely the occurrences, events, or states at a different, disjoint location for all time and that is why B is only a part of A in some epoch.

"But relative to a single world state, all parts reduce to O, so there exist innumerable epochs in which everything is contained in everything else!" Quite true -- theories have limiting cases (empty universes, point masses, division by 0, etc.) which are trivial or uninteresting or just peculiar. In this case we can say that our natural, pre-theoretic concepts of information, "laws", causality, etc. are always intended for a context in which there implicitly more than a single state. A function defined only at a single point is only tolerated as a function in a formal sense; in our usual conceptualization of correspondence or dependency, the domain is vaguely presupposed to contain several elements.

Since information storage is related to "containment" (and hence to a partial ordering) there is a kind of size relation. More information is stored in large parts (fine partitions, up near I in the lattice of World partitions) than in small ones (coarse partitions, down near O, in the lattice), but of course in most cases, the partitions are incomparable so we can't say which is bigger.

For example, in a toy universe with 4 states {a, b, c, d}, the information in {{a,b}, {c,d}} cannot be related in amount to the information in {{a}, {b,c}, {d}}, since neither partition is contained in the other. However, if we apply a Shannon information measure to the partitions by taking the relative sizes of the sets in a partition as "probabilities", then the Shannon measure preserves the ordering defined above. That is, if A contains B, then the Shannon measure H(B) <= H(A). (The only proof I know is surprisingly complicated and the result only holds for finite universes).

Information Transmission

For a deterministic system, one can define a formal notion of information transmission from one part to another. Information flows from A to B (relative to a given epoch) if B's current state always (in that epoch) determines what A's state was before the current state. If a part fails to transmit information to any other part, then the information is destroyed (and one can prove formally that if destroyed in this sense, the information can never be received from this part at a later time). I can only make these definitions work in systems for which the concept of "next state" is meaningful but I'm reluctant to assume that therefore time must be discrete; an inventive mathematician might be able to come up with a workable construction for systems with infinitesimal changes, in which time "flows"; I just don't see how to do it, and I do find the concept of discrete time very plausible, at least at that level of analysis at which it is meaningful to talk about information and causality.

Mathematical holism contradicts traditional realism

We usually think of copying as a typical case of transmission, but copying is actually more complex than just transmission because it presupposes some way of identifying the states of different parts. Since two disjoint parts can actually never have identical sets of states, the identification of part states ("These two spots are both red", "These two registers both contain 0") only occurs at an epistemological level, i. e., relative to perceptions, beliefs, etc. The traditional realism in which universals are exemplified by particulars is incompatible with the present theory.

Notice that in actual computers, whether or not the possible states of the memory are tokens of the same types cannot be determined independently of the causal connections -- whatever types are manifested in computation arise from functional equivalences across computations which in turn stem from the underlying global state-transition function. In other words, register A and register B can both have states which we can take as tokens of the number 0 if their states function in some context (say, addition modulo N) as 0. Whether or not A's 0 and B's 0 have any other identity (same voltage, or what have you) is quite irrelevant; in fact, what counts as the 0 state might itself change with time in different ways for A and B!

The localization of control

The dual of transmission is prediction (or control): A's current state determines B's next state. One of the interesting things about this 'reverse transmission' is its localization. (For the moment, set aside considering the state system as the entire world, and consider just a computer or any bounded system.) Imagine dividing up the transitions of a system into behavioral types -- classifying the changes. In a deterministic system, this induces a classification of the states themselves into types ("When the system is in state of type X, it does an X-action.")

Suppose further, that we have decided to describe the system in terms of a decomposition into parts (not necessarily all its parts in the sense defined above, but some collection, convenient for explanatory purposes, usually a Boolean algebra; so that the whole system is ascribed a structure derived from the set of the smallest non-trivial parts in the decomposition -- the atoms. Then one can show that there will be a unique smallest part (= partition) in this decomposition which contains the state classification partition. This part then contains the information about what kind of action the system is about to perform-- it corresponds functionally to the "control" or "program" register in a classical von Neumann machine and its states are functionally the instructions of the system (relative to that behavioral classification). The story can be continued to provide a functional reconstruction of many of the key concepts of von Neumann architecture; and although I haven't done it, it seems clear that similar reconstructions can be given for other architectures, as long as the system as a whole has states . (The mathematical details are in [Roosen-Runge, 1967].)

Information processing

The world has a change or transition relation -- in the case in which this relation is a function or is a function when restricted to a subset (i. e. the state transitions are deterministic), we can define absolute and relative information flows. These flows can be taken to constitute the information processing in the system. (I haven't worked out how to extend this to non-deterministic systems).

Information processing is thus ubiquitous -- you can't distinguish between parts of the world on the basis of whether they are or aren't doing information processing anymore than you could distinguish between them on the basis of whether they exhibit causal relations.

We can make a distinction between computation and information processing by defining computation relative to a specific decomposition of the whole system into parts (choosing a particular set of parts corresponds loosely, to choosing an 'architecture'). As an example, for any classification of transitions into 'instructions', we can define a rudimentary architecture with those instructions as states of a control unit. (Of course, for an arbitrary system, there won't be any commonly recognizable instructions; the addressing will be 'weird' and a single instruction will read and write to the rest of the system in strange ways.) Whether it's worth while theoretically to see any given system in computational terms depends both on the system and on one's theoretical objectives, but logically it can always be done.

Information storage

How does the concept of symbolic computation relate to computation in the sense of the preceding paragraph? Here mathematical holism has probably not much to contribute -- it fades into the underbrush, so to speak, since so much theoretical superstructure is now required to provide a context in which the concept of symbol could be coherently explicated. I tend to think about the problem in a sort of stripped-down form expressed by the following question: "under what circumstances does it make sense to say that a specific computer (or more generally, system) has stored information about an external feature of the world -- say, my phone number, or your name?"

It's obviously very hard to work out a specific, constructive answer for any particular case, but the general form of my answer is meaning-holism, what else! I'll omit the standard stuff about conceptual networks, etc. and just focus on a couple of small points: if my phone number is to function as a paradigm case of a symbol, then symbols must be non-unique instantiations of general relations; i. e., there is nothing which is necessarily symbolizes my phone number. But unfortunately the lack of necessity is infectious -- it extends to any (?) condition on the general relation. My computer 'knows' York University's phone number and evidences it by being able to dial it, but even if the dialer were to be removed, the computer still 'knows' the number, and even if the prefix has been omitted, the remaining digits are still a symbol of an incomplete number -- the computer just 'knows' part of York's phone number. Even if the remaining digits are wrong, the computer still has stored a phone-number, just the wrong one. Where does this stop? To which one poses a counter-question: why does it have to? I do not yet know how to show the fly the way out of this bottle.