Please visit http://benchmark.nicolasfiorini.info/ for more details on the project.

1. Download the benchmark data.

2. Have a look at the data. They consist of one tuning dataset and 7 evaluation datasets. For each of them, you will find a csv file that maps bookmark URIs to WordNet synset URIs. A WordNet 3.0 dump is provided in the knowledge folder. Note that a README file in this folder gives some details regarding the creation of the benchmark.

3. Tune your method with the tuning dataset, then run it on the evaluation datasets by using the provided WordNet dump. In order to evaluate your approach according to the next step, your algorithm must output trees in Newick format.

4. Install the ETE Python Toolkit by following these instructions. Note that the benchmark has been tested with Python v2.7.

5. Use the python script provided to compare two trees. It uses the Robinson-Fould distance. For each dataset, compute the distance of your tree with all expert ones, then average the distances. This gives you the score of your output on this dataset.