Fast hyperboloid decision tree algorithms
This repository contains code for the paper Fast Hyperboloid Decision Tree Algorithms (ICLR 2024), which you can find at one of these links:
To install, run this from your repo directory:
git clone https://github.com/pchlenski/hyperdt
cd hyperdt
pip install -e .
All dependencies are managed through pyproject.toml. For development, you can install the package with all development dependencies:
pip install -e ".[dev]"
For testing, you can install the test dependencies:
pip install -e ".[test]"
Additionally, hyperDT is available on PyPI. It can be pip installed as follows:
pip install hyperdt
HyperDT supports optional dependencies:
# Install with XGBoost support
pip install hyperdt[xgboost]
# Install with legacy implementation support (requires geomstats)
pip install hyperdt[legacy]
# Install all optional dependencies
pip install hyperdt[all]
A basic tutorial demonstrating key HyperDT functionality is available in notebooks/tutorial.ipynb
.
The hyperDT package is structured as follows:
hyperdt
_base.py
: Base classes for hyperbolic decision treestoy_data.py
: Utilities for generating synthetic hyperbolic datasetstree.py
: HyperbolicDecisionTreeClassifier and HyperbolicDecisionTreeRegressorensemble.py
: HyperbolicRandomForestClassifier and HyperbolicRandomForestRegressoroblique.py
: Oblique decision trees (requirespip install hyperdt[oblique]
)xgboost.py
: XGBoost integration (requirespip install hyperdt[xgboost]
)legacy/
: Original implementation (requirespip install hyperdt[legacy]
)dataloaders
: Functions for loading benchmarking data into HoroRFconversions
: Convert between hyperboloid, Poincare, and Beltrami-Klein modelsensemble
: Legacy HyperbolicRandomForestClassifier and HyperbolicRandomForestRegressorhyperbolic_trig
: Angular processing and midpoint calculations in the hyperboloid modeltree
: Legacy HyperbolicDecisionTreeClassifier and HyperbolicDecisionTreeRegressor, plus base classesvisualization
: Code to visualize decision boundaries on the Poincare disk
tests/
: Test files for verifying functionalitytest_typing.py
: Type annotation verification teststest_model_types.py
: Tests for classifier and regressor functionalitytest_toy_data.py
: Tests for data generation utilitiestest_equivalence.py
: Tests comparing the new implementation to the legacy codetest_oblique_models.py
: Tests verifying oblique decision trees can traintest_midpoint_override.py
: Verifies models are editable by zeroing out midpoints
The package has a modular design with optional dependencies:
- Core functionality only requires numpy, scikit-learn, scipy, and matplotlib
- XGBoost backend requires the xgboost package (
pip install hyperdt[xgboost]
) - Legacy implementation requires geomstats (
pip install hyperdt[legacy]
)
All figures and tables in the paper were generated using a combination of Python scripts and Jupyter notebooks. The notebooks used in development were filtered down to only those that remained relevant to the final paper and moved to the notebooks/archive
directory. The notebooks
directory contains a tutorial and symbolic links to notebooks of particular relevance to a figure, table, or section of a paper, named according to the section they reproduce.
benchmarks/hororf_benchmarks.py
runs the benchmarks contributing to Tables 1, 5, and 6, and benchmarks/scaling_benchmarks.py
runs the benchmarks contributing to Figures 6 and 7.
All relevant datasets, plus benchmarking code outputs, can be found on Google Drive.
To cite HyperDT, please use the following:
@inproceedings{
chlenski2024fast,
title={Fast Hyperboloid Decision Tree Algorithms},
author={Philippe Chlenski and Ethan Turok and Antonio Khalil Moretti and Itsik Pe'er},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=TTonmgTT9X}
}