Skip to content

pchlenski/hyperdt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hyperDT

Fast hyperboloid decision tree algorithms

Python Version License PyPI version Tests codecov Code Style: Black

This repository contains code for the paper Fast Hyperboloid Decision Tree Algorithms (ICLR 2024), which you can find at one of these links:

Installation:

Local install

To install, run this from your repo directory:

git clone https://github.com/pchlenski/hyperdt
cd hyperdt
pip install -e .

All dependencies are managed through pyproject.toml. For development, you can install the package with all development dependencies:

pip install -e ".[dev]"

For testing, you can install the test dependencies:

pip install -e ".[test]"

Pip install

Additionally, hyperDT is available on PyPI. It can be pip installed as follows:

pip install hyperdt

HyperDT supports optional dependencies:

# Install with XGBoost support
pip install hyperdt[xgboost]

# Install with legacy implementation support (requires geomstats)
pip install hyperdt[legacy]

# Install all optional dependencies
pip install hyperdt[all]

Tutorial

A basic tutorial demonstrating key HyperDT functionality is available in notebooks/tutorial.ipynb.

Package Structure

The hyperDT package is structured as follows:

  • hyperdt
    • _base.py: Base classes for hyperbolic decision trees
    • toy_data.py: Utilities for generating synthetic hyperbolic datasets
    • tree.py: HyperbolicDecisionTreeClassifier and HyperbolicDecisionTreeRegressor
    • ensemble.py: HyperbolicRandomForestClassifier and HyperbolicRandomForestRegressor
    • oblique.py: Oblique decision trees (requires pip install hyperdt[oblique])
    • xgboost.py: XGBoost integration (requires pip install hyperdt[xgboost])
    • legacy/: Original implementation (requires pip install hyperdt[legacy])
      • dataloaders: Functions for loading benchmarking data into HoroRF
      • conversions: Convert between hyperboloid, Poincare, and Beltrami-Klein models
      • ensemble: Legacy HyperbolicRandomForestClassifier and HyperbolicRandomForestRegressor
      • hyperbolic_trig: Angular processing and midpoint calculations in the hyperboloid model
      • tree: Legacy HyperbolicDecisionTreeClassifier and HyperbolicDecisionTreeRegressor, plus base classes
      • visualization: Code to visualize decision boundaries on the Poincare disk
  • tests/: Test files for verifying functionality
    • test_typing.py: Type annotation verification tests
    • test_model_types.py: Tests for classifier and regressor functionality
    • test_toy_data.py: Tests for data generation utilities
    • test_equivalence.py: Tests comparing the new implementation to the legacy code
    • test_oblique_models.py: Tests verifying oblique decision trees can train
    • test_midpoint_override.py: Verifies models are editable by zeroing out midpoints

The package has a modular design with optional dependencies:

  • Core functionality only requires numpy, scikit-learn, scipy, and matplotlib
  • XGBoost backend requires the xgboost package (pip install hyperdt[xgboost])
  • Legacy implementation requires geomstats (pip install hyperdt[legacy])

Reproducibility and data availability

All figures and tables in the paper were generated using a combination of Python scripts and Jupyter notebooks. The notebooks used in development were filtered down to only those that remained relevant to the final paper and moved to the notebooks/archive directory. The notebooks directory contains a tutorial and symbolic links to notebooks of particular relevance to a figure, table, or section of a paper, named according to the section they reproduce.

benchmarks/hororf_benchmarks.py runs the benchmarks contributing to Tables 1, 5, and 6, and benchmarks/scaling_benchmarks.py runs the benchmarks contributing to Figures 6 and 7.

All relevant datasets, plus benchmarking code outputs, can be found on Google Drive.

Citation

To cite HyperDT, please use the following:

@inproceedings{
    chlenski2024fast,
    title={Fast Hyperboloid Decision Tree Algorithms},
    author={Philippe Chlenski and Ethan Turok and Antonio Khalil Moretti and Itsik Pe'er},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=TTonmgTT9X}
}

About

Hyperbolic decision trees

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published