Microsoft: DAT208x Introduction to Python for Data Science: Module 4 Numpy

Today I completed Module 4 of the self paced EdX class Microsoft: DAT208x Introduction to Python for Data Science – Numpy.

Just a quick moment of celebration: Yay! I’m two-thirds of the way through a programming class in a topic that might have made me run away screaming in college.

The material is getting harder and denser, as a class probably should that is teaching a lot of new material.

Numpy is short for numeric Python. It seems to be pronounced “numb pie” instead of “numb P” which makes me thing of paraphrasing the Gumby theme song to “He can analyze any data set .. Num-py!”

But I digress. This is the module where I really started to see the power of Python and realize I may need to study some aspects of statistics more.

Numpy’s main strengths in my view are 1) the ability to work on entire tables of data at once with no need for loop code and its built-in package of statistical functions and relative easy subsetting of arrays.

The MS course also started to go into a few data analysis techniques apart from programming. Two examples:

  1. When you first get your data, it is very helpful to print the mean and median of each of the variables in your data. If the mean and median are far apart, and especially if the mean is an unrealistic value (say 2000 inches for human height) it may represent a flaw in data gathering and/or retrieval.
  2. It offered some tips on testing a guess/hypothesis, working through an example of whether soccer goal keepers were generally taller than others. Also offered and example of seeing whether their was a correlation between height and weight.

I also learned how to generate simulated data by passing parameters to a randomizing function.

At this point, I think a number of things you can do with data in Python are similar to what can be done in Excel. But I get the sense that Python will handle much larger datasets than Excel can. It may also be easier to compactly report the results. Also an examination of documentation at www.numby.org may yield functionality not available in Excel.

I haven’t yet established a home Python environment, but this lesson gave me inducement to do so. I have a few datasets I’d like to play with. Though at this point we haven’t covered importing data files into the Python environment.

Next module, likely done Sunday or Monday, will be on plotting data. Something I’m very much looking forward to.

Posted in lifelong learning | Tagged | Leave a comment

Social Media Decisions: Bye Bye Personal Blog

In my post 2016 Personal Social Media Inventory, I shared the social media I was currently using. I also shared the outlets I was definitely keeping, the ones I planned to get rid of and several I was still mulling over.

My “personal” blog of Eclectic Alaskan was on my list of maybes. I’ve had it for a long time, but a lot of the things I used to share on my personal blog, I’ve been sharing through Facebook. So I needed to either recommit to the blog or share my personal stuff (photos, politics, etc) through Facebook. After some soul searching, I reluctantly froze my blog, leaving up posts as historical information. I explained my decision in my last blog post at Eclectic Alaskan.

To clarify, THIS blog WILL continue. It is a convenient place to blog my learning activities and opinions about developments in the library and information science field.

Depending on how Facebook develops (degrades/enrages), I reserve the right to revive Eclectic Alaskan.

Posted in social media | Leave a comment

Microsoft: DAT208x Introduction to Python for Data Science – Module 3

Today I completed Functions and Packages Module 3 of the self paced EdX class Microsoft: DAT208x Introduction to Python for Data Science.

This was almost as much work as module 2 (lists), which is good because it means I’m learning new things.

The short summary of this module is:

Functions – Reusable bits of Python code used a particular task. If you can think of a particular task, there is likely a function for it.

Methods – Subclass of functions tied to a specific type of Python object. Called with a “.” after a variable name. Here’s an example of the difference using the variable “room” with the value “poolhouse”

Function – print(room) – this prints “poolhouse”

Method – room.count(“o”) – This counts the number of times the letter “o” appears in the variable “room” whose value is currently set to “poolhouse”. If we used the command:

print(room.count(“o”)), we would get “3”, the number of times that the letter o appears in poolhouse.

Packages – These are directories of Python functions and methods. Because there are a large number of discipline specific packages for Python, the basic distribution of Python doesn’t have them all. There is a tool called “pip” you can use to download and install packages you need for your work. According to the instructor for this course, three common packages needed for data science are  Numpy, Matplotlib and Scikit-learn. Numpy and Matplotlib have their own separate modules in this class.

Another interesting thing about packages is that installing them into your programming environment isn’t enough. There are one and usually two more things you need to do in your code itself:

  1. Have a line that imports the package (or subpackage, or function)
  2. If you’ve imported the entire package, you’ll need to preface the function with the package name.

So if I want to use the radians function from the math package to determine the number of radians in 12 degrees, I’d need these two lines in my program:

import math

print(math.radians(12))

If you don’t want to put “math.” in front of radians, Python lets you import single functions. So I could execute the radians command as:

from math import radians

print(radians(12))

But doing this method can be confusing to others looking at your code, particularly with longer programs. So I’ll probably import full packages. Not sure what this does to my actual program length. I’ll look into that later.

With this module, I’ve become convinced of the value of downloading Python to my home computer and working on it further. I’ve got other things I need to do today, but will try to start setting up my coding environment this week.

Aside from the constant ads from Datacamp during the lab portion of this course, I’m really liking the course organization of short lectures followed by hands on exercises. It makes me feel like I’m getting stuff done. There should be some way I could start doing training videos in a similar way for database or tech training – though I’m not sure how I’d get the hands-on piece.

Posted in lifelong learning | Tagged | Leave a comment

Social Media Decisions: Bye Instagram (for now)

In my post 2016 Personal Social Media Inventory, I shared the social media I was currently using. I also shared the outlets I was definitely keeping, the ones I planned to get rid of and several I was still mulling over.

Instagram was one of my maybes. I had decided to close this account, because I wasn’t getting much value from it. When I went to close out my account, I saw they had an option to temporally disable the account. It stays disabled (hides everything) until you activate it again.

I decided to hedge my bets by simply disabling my Instagram account. If I decide I really want it after all, I don’t need to recreate everything. If I don’t go back for another six months or a year, I’ll fully delete it.

Posted in social media | Leave a comment

Microsoft: DAT208x Introduction to Python for Data Science – Module 2

Today I completed Lists: A Data Structure, Module 2 of the self paced EdX class Microsoft: DAT208x Introduction to Python for Data Science.

This was more challenging than Module 1.  I was really grateful for the command line Python shell to experiment with how lists work.

Python lists are a compound data type, meaning they can mix all other kinds of variables in a list. So data about families or houses could all be stored in a single list. You can also create lists of lists. I believe that a Python list could also be referred as an array.

The module did a clear job of walking me through how to create lists, how to view and change elements within a list and how to add and delete list elements. Once again, I was troubled by the upselling by the Datacamp site. I do think this is likely to confuse some people into buying subscriptions to Datacamp that they don’t have to. On the other hand, I think it is an environment to be somewhat proud of.

The lists module was also very clear about a potential trap. If you copy a list like a regular variable, say:

list2 = list1

You’re not really making a copy of the list values, merely the references in memory. What this means is that if you change a value in list2 (list2[2]=”vanilla”), it changes the value of list1[2] to “vanilla” as well. If you want to make a copy you can make independent edits on, you need to use one of two commands:

list2 = lists(list1) or list2 = lists1[:] (This selects all values in list1 and copies them over to list2.

This weekend I hope to get to module 3, Functions and Packages. When I do, I’ll report back.


 

Posted in lifelong learning | Tagged | Leave a comment

Microsoft: DAT208x Introduction to Python for Data Science – Module 1

This week I started the self paced EdX class Microsoft: DAT208x Introduction to Python for Data Science. I’m taking this in part because for a few years now the library science literature has been increasingly insistent that data science is something librarians really need to know more about. Additionally, there are many, many datasets publicly available and I’d like to know how far I can get on my own to analyze and visualize ones I’m interested in.

To help cement my learning, I’ll be blogging each module here. I should also mention here that I have a smattering of several programming languages — usually enough to recognize a language and sometimes enough to either write my own programs or to modify the work of others to get what I want. I would not currently describe myself as fluent in anything. But my previous experience may help me to assimilate the material in this course than someone who hasn’t been exposed to programming languages at all.

Module 1: Basic Python

This was a mix of very short video lectures and serious handholding programming exercises in a lab environment at datacamp.com. I found them effective for what they presented. One significant problem for someone new to online learning is that everytime you finish a lab at datacamp, you are presented with a dialog box that invites you to “upgrade to continue” and offers you a $29.99 pass to all datacamp courses. This is NOT needed for the EdX course, but not everyone may scroll down the dialog box to click on the “continue learning” button that will not charge them and allow them to get back to the EdX interface.

What did I learn? As someone who has used C++ and javascript, I mostly learned that there are some programming languages not obsessed with semi-colons. I learned the python specific ways to do common calculations, declare variables and comment in code. I’m glad they’re building in commenting from the beginning as a good practice.

I learned a few things that I *think* are specific to Python:

  • You can multiply strings! “Hey “*2 = Hey Hey
  • You can get the type of a variable by using the command type(variable name) – This was useful in the exercises since Python’s ways of defining variables seem less formal to me than other languages.

This module also pointed us to the main python documentation at www.python.org and showed us where we can download Python for our own computers. I’m holding off on downloading python for the next few modules. I want to see it put to some practical use before I download another programming environment to my computer.

The next module – Python Lists, ought to have more substantial learning opportunities for me. I’ll report on that after I take it.

PS – I’m taking this class fully on my own time because data science is not in my current set of job duties. While there may be insights we can get from analyzing and visualizing Alaska Public Library Statistics, there is simply too much work in other areas to justify taking this course on work time.

Posted in lifelong learning | Tagged , | 1 Comment

Social Media Decisions: Twitter Yes / LibraryThing No

In my post 2016 Personal Social Media Inventory, I shared the social media I was currently using. I also shared the outlets I was definitely keeping, the ones I planned to get rid of and several I was still mulling over.

Twitter and LibraryThing were two of my maybes, but no longer.

Twitter goes to “keep.” Today I discovered that if you use TweetDeck to access Twitter, individual users can schedule tweets. This ability should be very helpful in my ongoing work to promote the State Agency Databases Project I coordinate for ALA GODORT.

LibraryThing goes to “discard.” As mentioned in my earlier post, I haven’t participated on LibraryThing since January 2014. Pretty much all of my fellow readers are on GoodReads, and the ones who aren’t don’t publicly disclose their reading. So I went ahead and deleted that account.

Decisions still needed:

  • Eclectic Alaskan (leaning keep, maybe with hiatus notice)
  • GooglePlus (still need to research its dependencies)
  • Instagram (leaning discard)
  • Writer’s Guide to Government Information (Will not be kept in current form, but still considering disposition options)

 

Posted in social media | Leave a comment