Blog @ Anarchius.Org: knowledge

Showing posts with label knowledge. Show all posts

June 01, 2009

But Its Not Google - more Search News

BING is the new name for Microsoft's search, formerly part of their LIVE suite of services. This seems to be a good few months for search. Just a few weeks ago we had the inauguration of Wolfram Alpha, and now the Bing.

Microsoft purportedly was very focused on verb-ifying their new offering and therefore had to go with the 'ing' ending. But Bing? As Chandler would say, Bing is Gaelic for 'Thy turkey is done'.

Bing has been live for a while, and it is not really all that bad. My bone of contention with its predecessor, Live search, was the super-heavy pages used to display results, and not so much the quality of the searches themselves. That seems to have changed with Bing. The pages seem quick. Not much of a fan of the changing main page background, but maybe that is just a matter of getting used to it. Of course, if you want to see the previous photos, you have to install Silverlight.

I guess for me, that is what makes Microsoft so annoying. It is like a car salesman that just wouldn't give up. It is always a matter of, 'I will give you this if you want that'. Microsoft properties online seem to acutely make you aware that you are a guest and therefore need to mind your manners and clicks. The constant struggle for one-up-manship reminds one of a petulant child, unhappy about the attention being showered on younger sibling Google.

One more matter of annoyance before I move on. This one specific to Bing. I don't think I am much of a fan of location assumptions being made by the software and then filtering my results without really letting me know about it. Seeing local search results surreptitiously is almost like those sleazy ads one sees online, from ladies starved of physical affection who magically know where you live and want to make a tryst with you. I may be using a proxy - did Bing think of that? Filtering my results for Indianapolis, Indiana when I am not even in the same state, isn't smart as much as it is annoying.

Bing seeks to bring travel into the search engine. As an example, they trot out the ability to book flights to Hawaii. Apparently, that is all there is to it. Try booking a flight to Milwaukee, and you are back to Expedia or Travelocity. So maybe, Hawaii wasn't so much as a feature, as it was a demo. Maybe we need Silverlight to be able to book tickets everywhere else.

Overall the search is in there somewhere. The interface is definitely better, and worth checking out. It has a few new demos of potential new features to come. Otherwise, it is pretty much same old, same old.

May 18, 2009

Wolfram Alpha live

It has been a while since there has been something new on the search scene. There was one highly publicized, incomplete junk that came out a few months ago called cuil (pronounced cool). As the name portends, it was not. At all.

Anyway - a new site called Wolfram|Alpha (name rather unfortunate) - is the new kid on the block bringing forth a really different approach to search. Instead of searching through pages to suggest potentially relevant pages, WA tries to answer quantitative questions with just the answer. If you need to know the population of the world, you just ask 'world population' and you get the answer of 6.53 billion people.

Search is tough. Understanding and algorithmically analyzing human fuzzy interaction is inherently difficult. And WA is trying to do two things at once - understand the fuzzy world-wide-web and obtain facts from them. This has long been the goal of a semantic web, that is nowhere near reality today. At the same time WA is trying to understand user searches to generate quantitative queries that can then be applied to the data that it has collected earlier.

First impressions - WA seems to be doing an ok job on the two entity intersections. If you are looking for 'world + population', you are good. Or you are looking for 'India + mobile phones' you are good too. But trying to do an intersection of 'world + population + mobile phones' seems to trouble the search engine.

Another disappointing aspect is its inability to interpret date and time as a dimension to queries that seem to work well on 'today'. For example, searching for gold + price works well. But trying to search for gold + price + any date doesn't compute. This seems to work for dow jones + any date, but not for gold + price. See similarities to the third level intersection problem observed above?

What it seems to be doing a good job is on the roll-ups. Try searching for Asia + cellular phones. You not only get the total estimate for Asia, but you also get a list of all the countries with their estimated cellular phone populations. Pretty interesting eh?

All in all, WA looks like a good for an alpha. It seems to be able to do simple queries and roll-ups. Not really good with anything complex, not to mention numerous glitches in the UI of the site. Also interesting will be the response from rights holders to their data being used by the engine. Granted 'facts' do not fall under the purview of copyrights. But what would happen if, say, results from surveys started to be incorporated into search results? That is if WA can one day show you the percentage of all people in United States, having cell phones with AT&T service expressing satisfaction with their service in a survey. What will the survey owners think of that? What will AT&T think of that?

All in all, am really excited to see where this goes. Here to the semantic web, without having to work hard at it.

October 01, 2002

Rammstein

And more specifically stripped. Incredible song. Or for that matter kokain. Man those riffs just drive you out of your mind dont they.

Filled up a survey today, about some perception thing, of companies recruiting on campus. Was so totally painful. I dont really understand. Why did I spend so much time filling it up. There was this HUGE matrix, which had to be filled with my opinions. Someone did not tell them things properly. I dont have opinions. Not atleast as many to fill up that monstrous matrix of theirs. Well, I did try, for a while. As i tried to form opinions on the spot and them put them on paper. Do you know how hard it is to form opinions on the spot? It is. And if you are finding it easy, you dont form opinions, you just think you do. Trust me on this. :)

One of the most incredible things is the fact that most people around you dont bother to form opinions. They have a few of their own opinions. You can figure out that this is their own opinion when people can be completely irrational about it and its consequences. But most other opinions you see around are only the sum total of the opinions formed from the positive part of your sphere of perception, that is all.

Okay, enough of rambling. Lets continue with the discussion we were having last. In the last post, we talked about the a number of definitions that led to the definition of the LSI or the linear scale of information. Given any observation, it can be located on this scale. What is an observation. An observation is any representation of a Data Source or DS. A photo is an observation of some reality. A word is an observation of some idea. A poem is an observation of some emotion/idea. A simple sentence also is an observation. So is a complex mathematical model of the universe.

One peculiarity about the LSI should be kept in mind. The LSI stretches from 0 to infinity. It is unbounded on the upper side. This means that a DS lies at infinity, and a completely useless bit of information lies at 0. We define data to lie in the small reaches, closer to 0 on the LSI. Information, relatively is higher on the scale. It represents a higher richness of data about a particular DS. Knowledge tends towards the object itself. A picture, worth a 1000 words, is therefore higher on the LSI with respect to the words it replaces.

This can be extended to any object, idea, thought or any other information content without any modifications. We can therefore use this structure to compare and develop better and higher forms of information management systems. That is what is envisaged as the end objective of this study. This structure can be used to describe any informational content with ease. We will go into details about the implementation of this structure soon, but before that we shall look into the way this method can be used to model interactions.

We define an interaction to be a process that allows for transfer of data between a DS and a DA using a Data Transfer Medium. This is the simplest definition of an interaction. An interaction can give rise to one of the following results. Information will be transferred from the DS to the Data Acquirer. In addition, the DS can change its state due to the interaction of the DS with the DTM (also known as the medium). Further, the interaction between the medium and the DA, will cause changes in the DA. Note that these changes are in addition to the simple transfer of information that can be attributed to the interaction.

This in fact follows from the defnitions we had seen yesterday. We have already talked about a query that is used by the DA to get information from the DS. Now when the query travels from the DA to the medium, the medium has obtained information. This causes a change in the medium itself. When the query is transported to the DS, the DS undergoes changes because of the informational content in the query. The exact similar process occurs when the DS replies with the answer to the query. The reader may note that no change occurs in the DA during the asking phase of the query, and no change happens in the DS during the reply phase. The DTM undergoes change twice, with both the query and the answer.

Lets see some practical explanations of the entire structure. Any systemic structure can be abstracted using this. In fact, now with the addition of the term interaction, we can now model dynamic changes in systems too.

Mail me, if you think there is some structure that cannot be abstracted using this framework. We will go into more practical considerations using this framework in later posts.

This is the first time that I actually continued a post beyond just one post. That must mean, I dont really think this idea to be crap.

Regards,

~!nrk

September 29, 2002

Data, Information and Knowledge

This in short is the brief history of the universe. I am not writing this blog after being overly fascinated with the book with the title that sounds obscenely like the quip I just quipped. In fact I think I read that book a long long time ago. What I am writing this is because I have an alternate view of the universe. A view that breaks down everything into a point on the line, defined with ends of data and knowledge, with information lying in the middle. I know I am getting a little too abstract here, but then I hope things will be clearer to be as we go along.

When I used to study science, something struck me as very odd. Physics, especially of the variety that is normally taught in the high schools, breaks down the universe into two sections. The physical universe and the law that govern this universe. What struck me as odd, is the fact that god actually defined such a cute little dichotomy in his world. Just like we have data and instructions, male and female, good and bad, we have matter and laws. Okay matter, energy and all that dark crap too, but basically the tangibles and the intangibles. Okay, this is not also very true, but... Wow, this is tough, getting the definition right. But basically the problem with god and his universe boils down to this - how did he come up with something that exists, and then threw away a hell lot for us to discover. Why all this segregation? Why this duality? Why was matter there, for all of us to see, and the rest of the relationships, laws etcetera for us to discover?

But then think about it. What was matter. Ask someone in the dark ages, (dark ages NOT defined as the time before the computer) and their *ologists will tell you that it is nothing but a combination of air, water, earth, fire and something else. Somewhere down the time line, people will tell you that matter was made of unbreakable balls, called atomz. Then people went berserk. Matter was made of all sorts of strange. mystical and mythical substances, which incidentally no one can see, but ought to be there for matter to make sense.

So what was different in matter in the dark ages and now? Nothing. It is the same old matter burnt and forged into different shapes, but still the same old matter. What changed is information. The information known to man and this knowledge has changed the way people look at matter. If this information was not available, a lot of people in nagasaki would have been nth-generation residents, instead of what they are now. Matter has changed because of what matter is to different people. To the ordinary man, matter is nothing more than just earth and air. Hence what is important about matter is not matter itself, but information about matter. What we see as matter is nothing but the extract of the information conveyed to us by the various input devices.

We will now look at a totally different way of seeing the universe. A way in which there is no difference between the various units of matter, energy, ideas, minds and everyother thing in the universe. This unified way of looking at the universe is going to help us define the entire universe on a one dimensional scale rating information content. This will then give us a powerful way of dealing with many problems on a vastly simplified, unified methodology.

But before this we need to get some basic framework necessary. We postulate the existence of three different types of entities in this universe. The first is the Data Source or the DS. The Data source is characterised by the fact that it contains data. It owes its existence to the data it contains. There is no restriction on the data it contains. Of course we havent defined what data itself is. But we are deliberately not defining it, since it will be globally defined with the circumstance under view. And moreover, we cannot define it in isolation from other units underconsideration. Now the second entity we postulate about is the existence of the Data Acquirer or the DA. The Data Acquirer or the DA can query the DS for data through the use of what is known as a Data Transfer Medium or DTM.

Given these basic units, we define some terms. The first term we will be defining is the Data Completeness (DC) of an entity. DC is defined as the relative content of data of a particular kind in a particular Data Source. Hence DC is defined for a DS and Data Type. For example a DS has 100% DC about itself. Any DS can answer any question itself. So its DC is complete. Note that DC is independant of the query for data, or the way the query is designed, or the DA itself or the DTM for delivery. The actual response of the DS to a query is a function of the ability and capacity of the DTM and the DA.

This leads to an interesting and obvious statement. Any DS is 100% Data Complete with its own data. In fact, any entity, which is 100% DS with the data of any DS, is virtually indistinguishable from the DS itself. This is because the said entity can answer any question about the DS. This means that any DA cannot distinguish between the impersonating entity and the Data Source itself.

Now the DC itself does not give any powerful medium for expressing data relationships. Since the DC is fully defined with the data type and the DS, we define another term called the Relative Data Completeness or the RDC. RDC is defined as the relative compelteness of data given the DS, the DA and the DTM. For example, a still photograph has an RDC of close to 100% for the original static setup, given that the DA is just seeing them with just sight as the source of data input. The moment the DTM expands to include say touch, the photograph no longer has 100% RDC.

The RDC therefore gives a powerful medium to express the quality of data relationships between the DS, the DTM and the DA. We will dwell more on various examples for these terms in later posts.

Data is always handled in packets called observations. This observation is not the observation that is defined for an experiment. Observation is a taggable block of data. Observations differ from one another in their quality. Observations are generally substantiated by data. The amount of data represented by an observation is its relative richness. Richness of an observation is defined on a scale called the Linear Scale of Information or LSI. Data is one end of the LSI scale, while knowledge is the other extreme. Information is lying in between. Data are the small individual pieces of information, that border on indivisibility. Knowledge is completeness of knowledge. An entity which has a 100% of Data Completeness (DC) is perfectly knowledgable, and can infact replace the DS itself. An entity having an RDC (Relative Data Completeness) of 100% implies that for a particulat DTM and a DA, the entity appears to be the DS itself.

We will stop this round of definitions here. Check back for more data and information on these terms soon.

tada for now

~!nrk

Pages