`anndata-cache`

A shared-process in-memory store/cache for AnnData objects and their parts.

Note: The Plasma in-memory store of pyArrow is deprecated. Therefore this project is stuck in a limbo. But as of now it works.

Installation

pip install git+https://github.com/gitHBDX/anndata-cache.git

You have to start the plasma_store on your own beforehand. It's best to make the plasma cache public after creation so the library cam work multi-process.

plasma_store -m {size_of_store} -s /tmp/plasma-dashboards
chmod 775 /tmp/plasma-dashboards

Alternatively, you can manage the plasma_store with systemd. See the template Service file in ./systemd.

Configuration

You can configure the Store with environment variables:

ANNDATA_CACHE_KILL_ON_FAIL (default: true): Whether to kill the whole app on an plasma error. Useful, if an app was was started after the plasma store was started and should auto-restart.
ANNDATA_CACHE_FOLDER: Location on Disk whether to keep the cold cache
ANNDATA_DATA_FOLDER: Location on Dist where the original data is stored
ANNDATA_CACHE_PLASMA_LOCATION (default: /tmp/plasma-anndata): Location in-memory where to keep the hot cache

Usage

The cache consists of a cold on-drive and a hot in-memory cache. When objects are requested they're put into the hot cache first-in-first-out, but also saved as a fast to retrieve partilized version in the cold cache.

# file1.py
import pandas as pd
import anndata_cache as cache

X: pd.DataFrame = cache.X("/path/to/some/anndata_file.h5ad")
# will load the anndata file and put _all_ parts of it in the cache for now but here return only the ad.X expression matrix as a DataFrame

# file2.py
import pandas as pd
import anndata_cache as cache

obs: pd.DataFrame = cache.obs("/path/to/some/anndata_file.h5ad")
# now because the whole anndata was already cached by the other process, in this evocation the call to obs is _essentially_ free. If in the mean time the hot cache ran full, this obs is not loaded from the orignal h5ad file but from the cold cache.

Developed @

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
src/anndata_cache		src/anndata_cache
systemd		systemd
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`anndata-cache`

Installation

Configuration

Usage

About

Releases

Packages

Languages

gitHBDX/anndata-cache

Folders and files

Latest commit

History

Repository files navigation

anndata-cache

Installation

Configuration

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`anndata-cache`

Packages