Throughout the competition, we've made a lot of functions that would be just generally helpful for real world practical machine learning that we had no clue about when we only studied theory. One example of this was with caching intermediate results. We found out using Python's pickle to serialize and deserialize a Pandas DataFrame or Numpy array actually was less space efficient and way slower than simply storing the objects as csv files! This was pretty amazing, because the csv files are more portable (e.g., we can use it with R or MATLAB) and faster! Another intuitiveness discovery was that our function of writing an array or DataFrame into svmlight format (a popular format for machine learning algorithms).
For now, we've been working hard on the competition, and for the sake of ease (i.e., we don't have to update the library every time we need a new function), writing these useful functions into the project code for now, but when we have free time, they'll be added to ProtoML.