The project can be split into two parts:
- first, cleaning existing machine learning related algorithms in scipy (eg scipy.clusters, several toolboxes in scipy.sandbox like pyem and svm). By cleaning, I mean adding tests, adding proper docstrings, and of course correcting bugs of the existing code.
- wrapping those cleaned toolboxes into a higher level package, in the spirit of data mining softwares ala orange, weka and co. This will include some graphical tools for data representation and manipulation, as well as basic storage facilities.