I was excited for our assignment on government data because my Techcrumbs final looks to use large datasets to really mature. I first looked through the cavernous datasets of data.gov primarily in the world of commerce/exports. It was an ordeal, not because the government has done a bad job in serving this data up using a unified UI with loads of features, but because of the lack of urgency govt agencies have organized their data. It puts a developer at a disadvantage because at any point, when the government wakes up and realizes this is the future of everything and mandates the upkeep of better datasets, scores of web apps, and applications will be broken. If you have time though, you can see an unreal amount of amazingly relevant data. This data access evolution is a shockingly important point in our democracy.

So, with that dizzying tour, i left data.gov behind and dove into the wonderfully organized world of socrata and NYC big data. If you don’t know, you should know. It is amazing. From knowing all the trees in your neighborhood, to the health scores of every restaurant, to what people call 311 about, nyc data is a microscope on how this massive metro works, thinks and feels.

I have avoided learning how to parse data for too long and parsing json and xml is a skill i will need going forward. I forced myself to make a data mashup using one of the xml exports of the top male baby names of 2009. I used jquery to both parse the xml and output the data into the DOM in the form of a word cloud. To make it work fast, i chose the top 25 names.

I found this to be much harder than i thought. First, i do not understand why they did not use unique names for the tags in the xml form. They use “row” twice, one to mean the child tag after “results” and also for the child of that first row. So, you must ignore the first result from a jquery selection. I also learned that you need to convert jquery results into strings in order to use them in css reformatting. In this case, i used the “count” of baby names to define the font size of the “baby name”. I also made each <a href> be a google search of the name. Here is the result:

I could format it better to pick out the number one name, and you dont have the actual count of the name displayed (i wanted to try to do that when you hovered over the link, but i could not get it to work). The mashup does tell one story about the top25 names. I have come up with an issue though when i put this mashup live on my server: Origin http://sciencelifeny.com is not allowed by Access-Control-Allow-Origin. There is something wrong with how i am getting xml file. Need some input on why that is happening.

Onto more datasets.