Tuesday, February 5, 2013

An Introduction To ProtoML

What is ProtoML?

ProtoML is a machine learning library built on top of scikit-learn (and hopefully a few more libraries soon!) with an aim for ease of use and rapid prototyping. We are part of a Kaggle group at RPI and we were searching for easy to use machine learning libraries and frameworks to quickly hack out some data analysis. Some of our favorites include scikit-learn, Orange, and Ramp. But none of them made it really easy to get off the ground once you have some clean data. There was always some hoops to jump through to start scaling out and trying different machines and using different features. That is why we decided to create a meta-modeling machine learning framework to make it as simple as possible to chain together feature selection and machines in different combinators.

Who is behind ProtoML?
Diogo Moitinho de Almeida & Bharath Santosh! Two students from RPI with too much free time and a dream (just kidding about the free time, don’t give me more homework Prof. Goldschmidt -Diogo).

What are our goals for the semester?
  • Make the implementation of machine learning algorithms as simple to try out as possible.
  • Implement some features missing from scikit-learn that are simple yet time consuming.
  • Provide a framework for automating as much of the data analysis process as possible.
  • Have everything run fast. Do as much possible in Cython, and try to cache everything.
  • Eventually act as the glue between the wide variety of available Python machine learning libraries.
  • win a kaggle competition

Where can you learn more?
To see all that we have available and use our latest prototypes, check out:
https://github.com/CurryBoy/ProtoML (you should do it; we love guinea pigs)

What’s next for the blog?
We are going to do a combination of rough overviews of machine learning concepts and how to use them with ProtoML, and keep everyone updated with the latest and greatest features!

Minor update:
Tons of progress and we just finished our very first meeting as an official RCOS group! Yay for us! We will soon be putting up a blog by next week on sample code to run for basic machine learning learning.

