Wednesday, August 15, 2007

Convex optimization: a primer

For quite some time now, I've been looking to getting some introductory text on convex optimization. I found a course with a book available online there: http://www.stanford.edu/class/ee364/. Will take a look at it once I finished my poster for ICSLP07.

Tuesday, June 12, 2007

Opensuse build service, access through ssh tunnel

For a few weeks now, I am working on providing binary packages for numpy and scipy for major linux distributions using the build service from opensuse. You can find a brief description of the build service on Miguel de Icaza blog here : http://tirania.org/blog/archive/2007/Jan-26.html. As he put himself, "For the last couple of years the folks at SUSE have been building a new build system that could help us make packages for multiple distributions from the same source code."

The system was a bit hard to set up because it is really rpm centric, even if it can also build debian packages. But it basically works: it provides a build farm (x86 and amd64 only) based on xen images, it provides command line tools to access the web api to build/modify packages. The one I am using is osc: it works a bit like subversion (you checkout, commit, etc...), and can use it to build locally, also (using a local minimal install of distributions in chroot jails).

Now, it is still a bit rough on the edges. The osc client does not work well behind proxy (this is really a python issue, actually, as the http handler in python does not handle https proxy well, AFAIK), and I had to use some strange hack suggested by one of the build service developer to connect through a SSH tunnel instead. Here is how to do it by using a tunnel from the port 9999 on the ssh machine to the public opensuse build system server.
  1. First, in $HOME/.oscrc, set apisrc to api.opensuse.org:9999
  2. add api.opensuse.org as an alias to localhost in /etc/host, eg : 127.0.0.0 localhost api.opensuse.org. This is the tricky part, because otherwise, the SSL certificates do not work: http://lists.opensuse.org/opensuse-buildservice/2007-04/msg00018.html
  3. run your ssh tunnel as ssh -L 9999:api.opensuse.org:443 sshserver
Note that some commands of osc need to access the web, and thus you still need a http_proxy set if you access through the internet with a proxy. As I understand it, python urllib2 function does not handle https proxy, but http proxy work.

Sunday, June 3, 2007

Install a R source package in local directory

If you don't have root access to your machine, it seems that it is not possible to install a package for R using install. Let's say you want to install some packages in $HOME/local/lib/R/library. Two things are necessary:
  1. Install the package with R at the correct location
  2. Tell R to look for the package at this location
The first is a simple matter of doing the following at the root of the package to install sources:

R CMD INSTALL -l $HOME/local/lib/R/library .

Then, you just have to launch R with the path set in R_LIBS:

R_LIBS=$HOME/local/lib/R/library/ R

You can then use library("name") to import the package from its name


Thursday, May 31, 2007

Creating a image disk (dmg) from command lineI

The docs for packaging an application for mac os X say to use the disk util application to make a .dmg file, but I would rather be able to do it from command line. It looks like the following work to make a dmg from a folder:

hdiutil create -fs HFS+ -srcfolder SRCFOLDER -volname VOLNAME IMGNAME

where SRCFOLDER is the folder where you have the files to put in the dmg, VOLNAME the volume name and IMGNAME the name of the .dmg file. As I know nothing yet on ressource forks, I do not know whether it keeps them or not (some scripts I found on the internet had this problem: http://www.macosxhints.com/article.php?story=20020311215452999).

Wednesday, May 30, 2007

First steps with Mac OS X

I recently bought a macbook, as I needed a laptop (my old toshiba was holding with tape for more than one year, and the hardrive finally failed a few days ago). I bought a macbook because they are cheap in Japan for a student (less than 1000 euros including apple care), and as Apple has a few, mostly non customizable models, it should be easier to get information for running linux on it.

This post will be the first one of a serie describing how to use traditional unix tools, and maybe also mac os X specific tools to produce usable softwares related to numpy/scipy/pymachine on mac os X platforms: compilation and packaging issues, etc...

First, the necessary bits for building numpy + scipy
  1. Install the developers tools (xcode and co):
  2. Install subversion: UB are here
Informations for compiling UB: here and here. Packaging information can be found here.

Creating a dmg from a folder:

the command line seems to be hdiutil (you can do it with the disk util, but I would rather be able to do it from command line). It looks like the following work to make a dmg from a folder:

hdiutil create -fs HFS+ -srcfolder SRCFOLDER -volname VOLNAME IMGNAME where SRCFOLDER is the folder where you have the files to put in the dmg, VOLNAME the volume name and IMGNAME the name of the .dmg file.

Thursday, May 24, 2007

Committing a project under bzr to a svn subtree

Using subversion is kind of painful for my personal projects, mainly because :
  1. I do not have a root account on my workstation, and I cannot run a subversion sever on it
  2. I have only a pure http available for my university webpage, and as such, I cannot serve any subversion non locally.
I am using for some time bzr, which is a decentralized source version control. I started using it a few months ago because I found bzr really easy to use, and as it does not require anything else than http, I can serve my projects from the university.

Now, when I want to submit some of those projects to subversion repositories, it is a problem. There is bzr-svn, which is a plugin able to understand svn repositories, and put its metadata under bzr control. Unfortunately, I could not use it to submit one of my projects as a module of an existing subversion server (maybe because I do not know much about source control systems). This is where tailor comes in: " Tailor is a tool to migrate changesets between ArX, Bazaar?, Bazaar-NG, CVS, Codeville?, Darcs, Git, Mercurial, Monotone, Perforce, Subversion? and Tla? repositories." (the main page of the project, as far as I can tell, is there).

I first used the last release of tailor, but it did not work quite well, and after some time, I finally understood it was because of desync between the tailor version and my version of bzr. Anyway, after fetching the last published sources, I could tranlate my bzr project to subversion. Here are the steps:
  1. I assume there is a subversion repo http://svnrep, to which I want to commit my project.
  2. I want to put it in http://svnrep/mainproject/trunk/foo.
  3. My bzr project is in $HOME/foo.bzr
I first generate the tailor config file for this convertion as the following:

tailor --verbose --source-kind bzr --target-kind svn \
--repository $HOME/foo.bzr \
--target-repository svn://svnrep/ \
--subdir tmp \
--target-module trunk/foo bzr2svnproj > svn2bzrproj.tailor

subdir tmp means that tmp will be the working directory. Then, the actual convertion is done using the commands:

tailor -D -v -c floupi.tailor

It works for 2 of my projects which are not big (a few tens of source files and revisions), so I don't know if this can be used for bigger projects easily.

Sunday, May 6, 2007

Get things done, task management and open source software

I've been looking this week end for some kind of task manager software, and remembered seeing some posts on the pymc's blog. Problem is, all those software are either mac specific or web-based only. What I am looking for is
  1. a software which is able to handle projets as a a set of tasks
  2. there should be time management (eg set the time a task should take, and a timeline to get a view of all the projects at the same time)
  3. should work on linux, and ideally should be open source.
I found two softwares which seem to fit this description:
  1. tracks: web-based (use ruby on rails)
  2. thinking rock: java-based.
I took the time to set-up tracks, and found some review there. Thinking rock also got its review recently here. I will take time to use them a bit the next few days, to see which one fits me the best.

Thursday, April 26, 2007

Introduction to pymachine

Ok, time to write something on this blog. This blog will mostly follow my progress on pymachine, a machine learning toolbox for the numpy/scipy environment. This project is supported by the Google Sumer of Code 2007, whose basic description can be seen here, and the full proposal can be seen there. All development will happen within the scipy community (mailing list, source code repository and project management).

The project can be split into two parts:
  1. first, cleaning existing machine learning related algorithms in scipy (eg scipy.clusters, several toolboxes in scipy.sandbox like pyem and svm). By cleaning, I mean adding tests, adding proper docstrings, and of course correcting bugs of the existing code.
  2. wrapping those cleaned toolboxes into a higher level package, in the spirit of data mining softwares ala orange, weka and co. This will include some graphical tools for data representation and manipulation, as well as basic storage facilities.