Recently I took a short course for understanding data. It revolves around forming primary and secondary questions, what, how project specs help to drill-down data. Exercises are done with experimental Google Fusion tables tool, which personally was pretty trivial to work with.

I liked that the course used real objects/data and theory is discussed after. This was a big plus, as it was easier to relate and memorize.

The weak parts were being too platform specific and lack of final project curation, as it was rather vague to decide on which datasets to pick and difficulty of primary and secondary questions to ask.

For my final project, I chose US east coast’s air quality data after BP oil spill incident in 2010. The date range was 2010-04-28 to 2010-09-18, whereas the spill itself started on 2010-04-20 and was oficially contained on 2010-09-20. Primary question was oil spill effect and persistence in time.

I noticed that in Lousiana state, the pollution was high and it spread into other states in September. Since the range was during the leak, general pollution level rose relatively not a lot after initial spill. Even after spreading into other states, the polution in Lousiana decreased only slightly

One of the main problems I had during the project was that I was unable to distribute data in time ranges, thus needed to create additional maps. Generally, other problems were related towards data scarcity (i.e. somehow altitude was missing) and parsing.

In conclusion, it was a useful course in flexing data to get insights. It would’ve been good if there had been a stronger final project provisioning.