Today I completed Module 5 of the self paced EdX class Microsoft: DAT208x Introduction to Python for Data Science – Matplotlib.
As you might guess from the title of this module, it was all about plotting data. I learned how to do line plots, scatter plots and how to graph three variables at once. The material looked at population trends and the correlation between life expectancy and per capita GDP.
The lessons covered different types of plots, how to customize the axes and some material on histograms, which was presented as a good early step in customizing your data. The commands were basic, but powerful. Again, Excel has similar functionality but I can see where it might be easier to customize a plot in Python. To get a sense of the full range of plots that matplotlib can do, check out their gallery.
One key piece I don’t yet have is how to access external files. I assume I can figure this out from the documentation if it isn’t presented as a lesson, but until I have that piece, it will be hard to do analysis on the public library stats and other data sets I might be interested in. But I’m intrigued to try.
I found this module to be easier than the previous two. Probably because I’m decent in producing charts in Excel and am familiar with the basics of plotting.
I also had another reminder that I need more education in data in addition to programming tools. There was one exercise where we were instructed to plot one variable on a logarithmic scale to make the trend show up better. It did, but I’m not sure why or under what conditions you’d want to use a logarithmic scale.
The final module has an intriguing title: Control Flow and Pandas. I’m guessing it’s not about pandas munching sticks of data.