September 29, 2002

Data, Information and Knowledge

This in short is the brief history of the universe. I am not writing this blog after being overly fascinated with the book with the title that sounds obscenely like the quip I just quipped. In fact I think I read that book a long long time ago. What I am writing this is because I have an alternate view of the universe. A view that breaks down everything into a point on the line, defined with ends of data and knowledge, with information lying in the middle. I know I am getting a little too abstract here, but then I hope things will be clearer to be as we go along.

When I used to study science, something struck me as very odd. Physics, especially of the variety that is normally taught in the high schools, breaks down the universe into two sections. The physical universe and the law that govern this universe. What struck me as odd, is the fact that god actually defined such a cute little dichotomy in his world. Just like we have data and instructions, male and female, good and bad, we have matter and laws. Okay matter, energy and all that dark crap too, but basically the tangibles and the intangibles. Okay, this is not also very true, but... Wow, this is tough, getting the definition right. But basically the problem with god and his universe boils down to this - how did he come up with something that exists, and then threw away a hell lot for us to discover. Why all this segregation? Why this duality? Why was matter there, for all of us to see, and the rest of the relationships, laws etcetera for us to discover?

But then think about it. What was matter. Ask someone in the dark ages, (dark ages NOT defined as the time before the computer) and their *ologists will tell you that it is nothing but a combination of air, water, earth, fire and something else. Somewhere down the time line, people will tell you that matter was made of unbreakable balls, called atomz. Then people went berserk. Matter was made of all sorts of strange. mystical and mythical substances, which incidentally no one can see, but ought to be there for matter to make sense.

So what was different in matter in the dark ages and now? Nothing. It is the same old matter burnt and forged into different shapes, but still the same old matter. What changed is information. The information known to man and this knowledge has changed the way people look at matter. If this information was not available, a lot of people in nagasaki would have been nth-generation residents, instead of what they are now. Matter has changed because of what matter is to different people. To the ordinary man, matter is nothing more than just earth and air. Hence what is important about matter is not matter itself, but information about matter. What we see as matter is nothing but the extract of the information conveyed to us by the various input devices.

We will now look at a totally different way of seeing the universe. A way in which there is no difference between the various units of matter, energy, ideas, minds and everyother thing in the universe. This unified way of looking at the universe is going to help us define the entire universe on a one dimensional scale rating information content. This will then give us a powerful way of dealing with many problems on a vastly simplified, unified methodology.

But before this we need to get some basic framework necessary. We postulate the existence of three different types of entities in this universe. The first is the Data Source or the DS. The Data source is characterised by the fact that it contains data. It owes its existence to the data it contains. There is no restriction on the data it contains. Of course we havent defined what data itself is. But we are deliberately not defining it, since it will be globally defined with the circumstance under view. And moreover, we cannot define it in isolation from other units underconsideration. Now the second entity we postulate about is the existence of the Data Acquirer or the DA. The Data Acquirer or the DA can query the DS for data through the use of what is known as a Data Transfer Medium or DTM.

Given these basic units, we define some terms. The first term we will be defining is the Data Completeness (DC) of an entity. DC is defined as the relative content of data of a particular kind in a particular Data Source. Hence DC is defined for a DS and Data Type. For example a DS has 100% DC about itself. Any DS can answer any question itself. So its DC is complete. Note that DC is independant of the query for data, or the way the query is designed, or the DA itself or the DTM for delivery. The actual response of the DS to a query is a function of the ability and capacity of the DTM and the DA.

This leads to an interesting and obvious statement. Any DS is 100% Data Complete with its own data. In fact, any entity, which is 100% DS with the data of any DS, is virtually indistinguishable from the DS itself. This is because the said entity can answer any question about the DS. This means that any DA cannot distinguish between the impersonating entity and the Data Source itself.

Now the DC itself does not give any powerful medium for expressing data relationships. Since the DC is fully defined with the data type and the DS, we define another term called the Relative Data Completeness or the RDC. RDC is defined as the relative compelteness of data given the DS, the DA and the DTM. For example, a still photograph has an RDC of close to 100% for the original static setup, given that the DA is just seeing them with just sight as the source of data input. The moment the DTM expands to include say touch, the photograph no longer has 100% RDC.

The RDC therefore gives a powerful medium to express the quality of data relationships between the DS, the DTM and the DA. We will dwell more on various examples for these terms in later posts.

Data is always handled in packets called observations. This observation is not the observation that is defined for an experiment. Observation is a taggable block of data. Observations differ from one another in their quality. Observations are generally substantiated by data. The amount of data represented by an observation is its relative richness. Richness of an observation is defined on a scale called the Linear Scale of Information or LSI. Data is one end of the LSI scale, while knowledge is the other extreme. Information is lying in between. Data are the small individual pieces of information, that border on indivisibility. Knowledge is completeness of knowledge. An entity which has a 100% of Data Completeness (DC) is perfectly knowledgable, and can infact replace the DS itself. An entity having an RDC (Relative Data Completeness) of 100% implies that for a particulat DTM and a DA, the entity appears to be the DS itself.

We will stop this round of definitions here. Check back for more data and information on these terms soon.

tada for now


No comments: