Thursday, April 26, 2007

Introduction to pymachine

Ok, time to write something on this blog. This blog will mostly follow my progress on pymachine, a machine learning toolbox for the numpy/scipy environment. This project is supported by the Google Sumer of Code 2007, whose basic description can be seen here, and the full proposal can be seen there. All development will happen within the scipy community (mailing list, source code repository and project management).

The project can be split into two parts:
  1. first, cleaning existing machine learning related algorithms in scipy (eg scipy.clusters, several toolboxes in scipy.sandbox like pyem and svm). By cleaning, I mean adding tests, adding proper docstrings, and of course correcting bugs of the existing code.
  2. wrapping those cleaned toolboxes into a higher level package, in the spirit of data mining softwares ala orange, weka and co. This will include some graphical tools for data representation and manipulation, as well as basic storage facilities.