Blog @ Anarchius.Org: data

March 09, 2013

All my Flights

Till now, I have flown a total of 481,333 miles. That is more than 19 times around the world, and twice the distance to the moon.

I have done this via a total of 187 flight segments, passing through 37 airports, in 11 countries and utilizing 24 carriers.

I have flown as north as Edinborough (EDI), as south and east as Kuala Lumpur (KUL), and as west as San Francisco (SFO).

The longest flight I have taken is the 14 hour leg from Dubai (DXB) to New York (JFK). The shortest is the 37 minute hop between Milwaukee (MKE) and Chicago (ORD).

All these stats are thanks to two things. One, my idiosyncrasy of keeping a log of all the flights I have ever taken. The second is the site for people like me called OpenFlights.org

All I needed was a bit of excel transformation to convert my existing flight log into a format acceptable to the site, upload and done.

That is how I know that if I had taken all the flights one after another, I would have spent 43 days, 22 hours and 39 minutes in the air. In reality, if I include all the check-in and waiting times, you could call it about two months in airports and flights around the world.

June 03, 2011

Liberating data using Scraper Wiki

Of all the wiki sites that sprung up after the original, one of the most useful and positively cool is ScraperWiki. Scraper wiki is an attempt to liberate data from websites and pdfs and instead populate spreadsheets with them.

There is a lot of data available on the net. But its value is severely limited by the fact that you cannot do much more than just browsing it. When you move data from a html page or a pdf file into a spreadsheet, suddenly the value of the data goes up many fold. Now you can analyze the data, sort it, look for trends and coax information out of it. ScraperWiki aids in the first step by scraping web pages and moving data into usable data sets.

ScraperWiki is two things. First it is a web-based compiler and reusable libraries (in Python, Ruby or PHP) that allows you to write and run a scraper. Second, it is a wiki store of scrapers written by others that you can then update, reuse or just run to get data.

There are quite a few interesting scrapers. This scraper collects data from weather stations across all of Germany, while this one collects the Location IDs from Weather.com URLs. Weather is not all scrapers do, this one for example collects basic info about all MLB players, while this one is an massive database of all soccer WorldCup matches.

Of all the untold millions spent by governments and corporations on digitizing their data and making web pages, a decent portion that went towards making html tables out of data sets. ScraperWiki is an attempt to reverse that. Cheers to liberating data from the shackles of the web.

Pages

March 09, 2013

All my Flights

June 03, 2011

Liberating data using Scraper Wiki