I admit, this is one of those posts that will likely be useful for just me. This post is for my benefit. I’m trying to learn ML with Python and I find the number of involved libraries confusing. So this post is my attempt at a little clarity when it comes to some of these libraries, primarily the Scikit-Learn library. I find it helpful to get a lay of the land before I start diving into a new library.

Relation to Other Libraries

One of the things that has been confusing to me is all the nifty sounding libraries you instantly read about as soon as you google machine learning. SciPy, NumPy, Pandas, Matplotlib, Scikit-Learn. That’s a lot of science, when do I use a specific libary and for what purpose?

There is obviously way more to this but here’s what I know so far:

  1. Think of SciPy as an umbrella project. NumPy, Pandas, Matplotlib and many others are under the umbrella of SciPy.
  2. Think of NumPy as being oriented towards linear algebra and N-dimension arrays.
  3. Think of Pandas as an in memory spreadsheet that is very flexible and offers a lot of options.
  4. Think of Matplotlib as, obviously, a plotting/graphing library.
  5. Scikit-Learn is a machine learning library. It is an add-on package for SciPy. There is an entire library of these add-on packages named SciKits.

Site Organization

Near the top left corner of the site is a Getting Started button. This is exactly what you think it is; note that “It assumes a very basic working knowledge of machine learning practices”.

The front page currently has six callouts to what I can only assume are very important topics. All of these are links into the user guide that is available but, since they are called out specifically, I’m going to take that to mean that they are important.

At the very top of the page are a few links. It appears the most useful of these is the “User Guide” link. This link will of course take you to the User Guide, however, it also has a great left hand side navigation bar that shows you pretty much the rest of the documentation. You can use the other links at the top to navigate as well but they don’t all provide the same navigation interface.

That left hand navigation holds really holds the keys to the kingdom. Tutorials, getting started, glossary, api references, examples, etc. The tutorials seem to be a good place to start if you need to get familiar with basic ML.

Key Concepts

Scanning the site, you’ll start to see some keywords; understand them:

  • vector
  • matrix
  • fit
  • estimator
  • regression
  • classification

Key Urls