Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tests and CI workflows #8

Merged
merged 15 commits into from
Mar 31, 2025
Merged

Adding tests and CI workflows #8

merged 15 commits into from
Mar 31, 2025

Conversation

enriquea
Copy link
Collaborator

@enriquea enriquea commented Mar 31, 2025

PR Type

Enhancement, Tests, Configuration changes


Description

  • Added new CI workflows for testing and linting with GitHub Actions.

    • Introduced workflows for Python package testing across multiple versions.
    • Added Conda-based workflow for dependency management and testing.
  • Improved code formatting and consistency across multiple files.

    • Standardized string quotes and function parameter formatting.
    • Enhanced readability with better indentation and alignment.
  • Enhanced test coverage and updated test data handling.

    • Added temporary directories for test outputs and cleanup.
    • Updated tests to use constants and paths dynamically.
  • Introduced constants for better file path management.

    • Added BASE_DIR and UCSC_JSON_FILE_PATH for centralized path handling.

Changes walkthrough 📝

Relevant files
Formatting
1 files
__init__.py
Minor formatting adjustment in imports                                     
+1/-1     
Enhancement
10 files
annotate_features.py
Standardized string quotes and improved function formatting
+58/-55 
make_annotation_tables_cli.py
Removed unused options and improved CLI formatting             
+123/-84
hvantk.py
Updated CLI entry point formatting                                             
+7/-4     
settings.py
Standardized string quotes and added constants for paths 
+12/-13 
annotate.py
Standardized string quotes and improved function formatting
+89/-81 
constants.py
Added `BASE_DIR` and `UCSC_JSON_FILE_PATH` constants         
+18/-10 
dataset.py
Standardized string quotes and improved function formatting
+21/-45 
file_utils.py
Improved error handling and logging in file downloads       
+20/-14 
generate_training_set.py
Standardized string quotes and improved readability           
+49/-58 
make_tables.py
Improved function formatting and added parameter documentation
+79/-73 
Documentation
2 files
ucsc_cell_datasets.py
Improved class docstrings and formatting                                 
+104/-87
README.md
Added CI badge for Conda-based workflow                                   
+2/-0     
Tests
3 files
test_UCSCDataSetCollection.py
Updated tests to use constants for paths                                 
+9/-7     
test_create_gene_annotation_tables.py
Enhanced test data handling with temporary directories     
+76/-35 
test_downloader.py
Improved test structure and added mock responses                 
+14/-10 
Configuration changes
4 files
setup.py
Standardized string quotes and updated dependencies           
+5/-9     
python-app.yml
Added GitHub Actions workflow for Python application         
+39/-0   
python-package-conda.yml
Added Conda-based GitHub Actions workflow                               
+34/-0   
python-package.yml
Added GitHub Actions workflow for multi-version Python testing
+40/-0   
Additional files
6 files
environment.yml +4/-4     
clinvar_20220403_chr20.vcf.bgz [link]   
ensembl_gene_biomart.tsv.bgz [link]   
gevir_metrics_pmid31873297.tsv.bgz [link]   
gnomad.v2.1.1.lof_metrics.by_gene.chr20.tsv.bgz [link]   
Interactome_INSIDER_hg38_stripped.chr20.bed.bgz [link]   

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Summary by CodeRabbit

    • New Features

      • Introduced several automated build and test workflows that streamline dependency management, linting, and testing across multiple Python versions and Conda environments.
      • Added a build status badge to the project documentation for immediate integration feedback.
      • Added a new constant for UCSC JSON file path to improve maintainability.
    • Chores / Refactor

      • Updated environment settings for broader Python version compatibility.
      • Made extensive code formatting and test infrastructure improvements to enhance maintainability and reliability.
      • Removed the testdata directory from the .gitignore file to allow tracking in version control.

    @enriquea enriquea requested review from ypriverol and Copilot March 31, 2025 09:05
    @enriquea enriquea self-assigned this Mar 31, 2025
    Copy link
    Contributor

    coderabbitai bot commented Mar 31, 2025

    Walkthrough

    This pull request introduces three new GitHub Actions workflows to automate installing dependencies, linting, and running tests for Python applications under various configurations (standard, Conda-based, and multi-version). It updates environment settings, adjusts file tracking in Git, and makes numerous formatting improvements across source, test, and utility files for consistency. These changes do not modify functional behavior but streamline code readability and enhance CI process integration.

    Changes

    File(s) Change Summary
    .github/workflows/python-app.yml, .github/workflows/python-package-conda.yml, .github/workflows/python-package.yml New CI workflows that automate dependency installation, linting, and testing across different Python environments (standard, Conda, and matrix builds).
    .gitignore, README.md Removed the ignore rule for testdata in .gitignore and added a new Conda build status badge to README.md.
    environment.yml Updated environment name (from hvantk-env to hvantk), changed Python version specification to >=3.10.0, and removed the version constraint for setuptools.
    hvantk/__init__.py Added a trailing newline.
    hvantk/commands/annotate_features.py, hvantk/commands/make_annotation_tables_cli.py Reformatted code for consistent string quotations, spacing, and function signatures without changing logic.
    hvantk/datasets/ucsc_cell_datasets.py Improved formatting of import statements and docstrings for better readability.
    hvantk/hvantk.py, hvantk/settings.py Adjusted formatting in decorators, error messages, and string delimiters for consistency.
    hvantk/tests/test_UCSCDataSetCollection.py, hvantk/tests/test_create_gene_annotation_tables.py, hvantk/tests/test_downloader.py Modified test file paths by introducing new directory variables, imported constants for file paths, enhanced JSON formatting, and added cleanup steps for temporary directories.
    hvantk/utils/annotate.py, hvantk/utils/constants.py, hvantk/utils/dataset.py, hvantk/utils/file_utils.py, hvantk/utils/generate_training_set.py, hvantk/utils/make_tables.py Numerous formatting updates including consistent string quoting, restructuring of function signatures, and the addition of new constants (e.g., BASE_DIR and UCSC_JSON_FILE_PATH).
    setup.py Standardized formatting for install_requires, entry_points, and python_requires sections by switching to double quotes and a more compact style.

    Sequence Diagram(s)

    sequenceDiagram
        actor Dev as Developer
        participant GH as GitHub
        participant CI as Workflow
        Dev->>GH: Push or create PR on "main"
        GH->>CI: Trigger corresponding CI workflow(s)
        CI->>CI: Checkout repository (actions/checkout@v4)
        CI->>CI: Setup Python environment (actions/setup-python@v3)
        CI->>CI: Install dependencies (pip/Conda install, requirements/environment.yml)
        CI->>CI: Run linting (flake8) and tests (pytest)
        CI-->>GH: Report build status
    
    Loading

    Suggested labels

    Review effort 2/5

    Poem

    I'm a hopping rabbit in a code-filled glen,
    Watching workflows dance again and again.
    New tests and linting keep bugs at bay,
    Code now sparkles in a neat, clear way.
    With every change, I twirl with delight –
    A joyful CodeRabbit, coding through the night! 🐰💻


    📜 Recent review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 4d2f2ca and e64be20.

    📒 Files selected for processing (1)
    • hvantk/tests/test_create_gene_annotation_tables.py (6 hunks)
    🚧 Files skipped from review as they are similar to previous changes (1)
    • hvantk/tests/test_create_gene_annotation_tables.py
    ⏰ Context from checks skipped due to timeout of 90000ms (5)
    • GitHub Check: build (3.10)
    • GitHub Check: build (3.12)
    • GitHub Check: build (3.11)
    • GitHub Check: build
    • GitHub Check: build-linux

    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Generate unit testing code for this file.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai generate unit testing code for this file.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and generate unit testing code.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR.
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai plan to trigger planning for file edits and PR creation.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR adds tests, updates CI workflows, and makes various formatting improvements for consistency across the codebase. Key changes include:

    • Refining code formatting (use of double quotes and consistent indentation) in annotation and CLI modules.
    • Updating tests to use temporary directories and improving file path handling.
    • Adding and updating GitHub Actions workflows and environment configurations.

    Reviewed Changes

    Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.

    Show a summary per file
    File Description
    hvantk/utils/annotate.py Formatting improvements and minor list style updates.
    hvantk/tests/* Updates to use temporary directories and consistent quoting.
    hvantk/settings.py Minor reformatting and updated string literals.
    hvantk/hvantk.py & hvantk/datasets/ucsc_cell_datasets.py Code style consistency updates.
    hvantk/commands/make_annotation_tables_cli.py Refactored CLI option formatting and option help text updates.
    hvantk/commands/annotate_features.py Formatting updates and improved parameter usage.
    environment.yml, README.md, .github/workflows/* Added/updated CI workflows and environment configuration.
    Comments suppressed due to low confidence (1)

    hvantk/commands/make_annotation_tables_cli.py:78

    • The help text for the '--interactome' option appears incorrect as it references a CCR table. Consider updating it to reflect that it creates/updates the interactome table instead.
    @click.option("--interactome", is_flag=True, help="Create/update CCR table from source.")
    

    Copy link
    Contributor

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Logic Error

    The is_pathogenic variable is defined before is_benign but used after it in the code. This could lead to incorrect behavior since is_pathogenic is referenced in the is_benign calculation before it's properly defined.

    is_pathogenic = t.clinvar_clnsig.any(
        lambda x: hl.set(pathogenic_label_clinvar).contains(x)
    )
    is_benign = t.clinvar_clnsig.any(lambda x: hl.set(benign_label_clinvar).contains(x))
    Missing Variable

    The total_size variable is used in the download_file function but might not be defined in some code paths, which could cause runtime errors.

    max_size = 1024 * 1024 * 1024  # 1GB
    if total_size > max_size:
        logger.warning(
            f"File is very large ({total_size/1024/1024:.1f} MB), exceeding recommended size of {max_size/1024/1024:.1f} MB"
        )
    Hardcoded Path

    The test assumes a specific number of datasets (267) which might break if the dataset changes. Consider making this test more robust by checking for existence rather than exact count.

    assert len(collection.datasets) == 267
    assert isinstance(collection.datasets[0], UCSCDataset)
    assert collection.datasets[0].name == "cortex-dev"

    Copy link
    Contributor

    qodo-merge-pro bot commented Mar 31, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix condition calculation order

    The code first references t.clinvar_clnsig to pre-calculate conditions, but then
    immediately overwrites this field with clinvar_ht[t.key].info.CLNSIG. This means
    the pre-calculated conditions are using the wrong data. Move the annotation line
    before the condition calculations.

    hvantk/utils/annotate.py [32-38]

    +t = t.annotate(clinvar_clnsig=clinvar_ht[t.key].info.CLNSIG)
    +
     # Pre-calculate conditions for better readability and performance
     is_pathogenic = t.clinvar_clnsig.any(lambda x: hl.set(pathogenic_label_clinvar).contains(x))
     is_benign = t.clinvar_clnsig.any(lambda x: hl.set(benign_label_clinvar).contains(x))
     
    -t = t.annotate(clinvar_clnsig=clinvar_ht[t.key].info.CLNSIG)
    -

    [Suggestion has been applied]

    Suggestion importance[1-10]: 10

    __

    Why: This is a critical bug fix. The code is calculating conditions using t.clinvar_clnsig before it's populated with data from clinvar_ht, making the pre-calculated conditions meaningless and leading to incorrect results.

    High
    Fix null path variable

    The project_dir variable is set to None, which will cause a runtime error when
    constructing the out_path string. This will fail when trying to concatenate None
    with a string path.

    hvantk/commands/annotate_features.py [26-27]

    -project_dir = None
    +from pathlib import Path
    +
    +# Default to current directory if project_dir is not specified
    +project_dir = str(Path.cwd())
     out_path = f"{project_dir}/data/features"

    [Suggestion has been applied]

    Suggestion importance[1-10]: 10

    __

    Why: The code attempts to use a None value in string formatting which will cause a runtime error. This is a critical bug that would prevent the code from running at all, making this fix essential for basic functionality.

    High
    Fix file path resolution

    The path to the JSON file is constructed using BASE_DIR which points to the
    parent directory of constants.py, but the test is trying to access a file in a
    different location. This will likely fail when tests are run from different
    directories. Consider using an absolute path or a path relative to the test
    file.

    hvantk/tests/test_UCSCDataSetCollection.py [62-64]

    +# Use Path to construct proper absolute path
    +from pathlib import Path
    +test_resources_path = Path(__file__).parent.parent / "resources" / "cells_ucsc_datasets.json"
     collection = UCSCDataSetCollection.from_json(
    -   UCSC_JSON_FILE_PATH
    +   str(test_resources_path)
     )
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    __

    Why: The current implementation uses a constant path that may not work correctly when tests are run from different directories. Using Path to construct a proper absolute path relative to the test file ensures tests will run correctly regardless of the working directory.

    High
    Fix base directory path

    The BASE_DIR is set to the parent directory of the current file (constants.py),
    which is the utils directory. This will cause issues when trying to reference
    files relative to the project root. It should be set to the project root
    directory.

    hvantk/utils/constants.py [9-10]

     # Base directory of the project
    -BASE_DIR = Path(__file__).resolve().parent
    +BASE_DIR = Path(__file__).resolve().parent.parent
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    __

    Why: The current BASE_DIR points to the utils directory instead of the project root, which will cause path resolution issues when trying to access files relative to the project root. This is a critical fix for proper file path resolution throughout the project.

    High
    Create temporary directory

    The test creates a temporary directory but doesn't ensure it exists before using
    it. This will cause tests to fail if the directory doesn't exist. Add code to
    create the directory before the first test runs.

    hvantk/tests/test_create_gene_annotation_tables.py [15-16]

     # Temporary directory for testing
     TMP_DIR = Path(__file__).parent / "tmp"
    +TMP_DIR.mkdir(exist_ok=True)
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    __

    Why: This suggestion prevents test failures by ensuring the temporary directory exists before it's used. Without this fix, tests would fail when run on a fresh system where the directory doesn't exist.

    Medium
    • Update

    Copy link
    Contributor

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 0

    🧹 Nitpick comments (10)
    hvantk/utils/generate_training_set.py (1)

    41-41: Gene Annotation Simplification
    Annotating the clinvar_ht table with a new gene field in one line is concise and clear. Consider verifying that using split("[:]") is intentional. If the goal is to split by a simple colon, using split(":") might be more straightforward.

    hvantk/__init__.py (1)

    1-2: Clarify Public API for main:
    The file imports main from .hvantk but does not use it locally. If the intent is to re‐export it as part of the package’s public API, please consider adding an __all__ declaration (e.g. __all__ = ["main"]) to make this explicit. Otherwise, if it’s not needed here, removing the unused import is also an option.

    🧰 Tools
    🪛 Ruff (0.8.2)

    1-1: .hvantk.main imported but unused; consider removing, adding to __all__, or using a redundant alias

    (F401)

    hvantk/utils/dataset.py (1)

    44-45: Potential Robustness Improvement for Time Point Extraction:
    The expression

    tps = t[tp_col].key_set().filter(lambda x: x.organ == organ).time_point.collect()[0]

    (around line 44) assumes that the filtered collection is non-empty. Consider adding error handling or a conditional check to guard against an empty list, which would otherwise raise an IndexError.

    .github/workflows/python-app.yml (1)

    21-25: Update the GitHub Actions setup-python version.

    The setup-python action is using an older version (v3) which may be outdated.

    -    - uses: actions/setup-python@v3
    +    - uses: actions/setup-python@v4
    🧰 Tools
    🪛 actionlint (1.7.4)

    23-23: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    hvantk/commands/annotate_features.py (1)

    19-19: Remove unused import.

    The annotate_degs function is imported but not used in the code (it appears to be commented out in lines 89-90).

    -    annotate_degs,
    🧰 Tools
    🪛 Ruff (0.8.2)

    19-19: hvantk.utils.annotate.annotate_degs imported but unused

    Remove unused import: hvantk.utils.annotate.annotate_degs

    (F401)

    .github/workflows/python-package-conda.yml (2)

    23-23: LGTM! Consider adding environment activation validation.

    The command to update the base Conda environment looks good.

    Consider adding a step to verify the environment was updated correctly by printing the installed packages:

      run: |
        conda env update --file environment.yml --name base
    +   # Verify environment was updated correctly
    +   conda list

    26-30: Consider adding a flake8 configuration file.

    The flake8 configuration is currently hardcoded in the workflow. For better maintainability, consider moving this configuration to a .flake8 file in the repository.

    This would allow developers to use the same configuration locally and make it easier to update in the future.

    hvantk/tests/test_create_gene_annotation_tables.py (1)

    18-166: Consider creating a setup/teardown fixture for test directory.

    All test functions have similar cleanup code for removing the temporary directory. This could be extracted into a fixture using pytest's @pytest.fixture to avoid repetition.

    import pytest
    
    @pytest.fixture(autouse=True)
    def setup_teardown():
        """Create and clean up the temporary directory for each test."""
        # Setup - This runs before each test
        TMP_DIR.mkdir(exist_ok=True, parents=True)
        
        # The test runs here
        yield
        
        # Teardown - This runs after each test
        if TMP_DIR.exists():
            shutil.rmtree(TMP_DIR)

    This fixture could be placed at the top of the file and with autouse=True, it would automatically run for each test case without explicitly calling it.

    hvantk/datasets/ucsc_cell_datasets.py (1)

    118-124: Consider adding a timeout parameter.

    When downloading files, it's a good practice to set an explicit timeout.

    Looking at the implementation of download_file() in the relevant snippet, it already includes a timeout parameter. You might consider adding a parameter to allow customizing this timeout if necessary for different network conditions.

    def download_expression_matrix(self, out_dir: str, timeout: int = 30) -> str:
        """
        ...
        Args:
            out_dir: Directory where the file will be saved
    +       timeout: Connection timeout in seconds
        ...
        """
        url_download = (
            f"{UCSC_CELL_BROWSER_BASE_URL}/{self.name}/{EXPRESSION_MATRIX_FILE_NAME}"
        )
        try:
            return download_file(
                url=url_download, out_dir=out_dir, file_name=EXPRESSION_MATRIX_FILE_NAME, timeout=timeout
            )
    hvantk/utils/annotate.py (1)

    138-140: Avoid Mutable Default Argument in annotate_degs.
    Using a mutable list as a default for clusters can lead to unexpected behavior. It’s recommended to set the default to None and initialize within the function.

    -def annotate_degs(t: hl.Table, gene_symbol_col: str, clusters: list = ["C0", "C5", "C7", "C10", "C14"]) -> hl.Table:
    +def annotate_degs(t: hl.Table, gene_symbol_col: str, clusters: list = None) -> hl.Table:
         if clusters is None:
    -        clusters = ["C0", "C5", "C7", "C10", "C14"]
    +        clusters = ["C0", "C5", "C7", "C10", "C14"]
    🧰 Tools
    🪛 Ruff (0.8.2)

    139-139: Do not use mutable data structures for argument defaults

    Replace with None; initialize within function

    (B006)

    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 547779c and 2819bfe.

    📒 Files selected for processing (22)
    • .github/workflows/python-app.yml (1 hunks)
    • .github/workflows/python-package-conda.yml (1 hunks)
    • .github/workflows/python-package.yml (1 hunks)
    • .gitignore (0 hunks)
    • README.md (1 hunks)
    • environment.yml (1 hunks)
    • hvantk/__init__.py (1 hunks)
    • hvantk/commands/annotate_features.py (2 hunks)
    • hvantk/commands/make_annotation_tables_cli.py (2 hunks)
    • hvantk/datasets/ucsc_cell_datasets.py (6 hunks)
    • hvantk/hvantk.py (1 hunks)
    • hvantk/settings.py (2 hunks)
    • hvantk/tests/test_UCSCDataSetCollection.py (4 hunks)
    • hvantk/tests/test_create_gene_annotation_tables.py (6 hunks)
    • hvantk/tests/test_downloader.py (3 hunks)
    • hvantk/utils/annotate.py (11 hunks)
    • hvantk/utils/constants.py (1 hunks)
    • hvantk/utils/dataset.py (4 hunks)
    • hvantk/utils/file_utils.py (2 hunks)
    • hvantk/utils/generate_training_set.py (3 hunks)
    • hvantk/utils/make_tables.py (8 hunks)
    • setup.py (2 hunks)
    💤 Files with no reviewable changes (1)
    • .gitignore
    🧰 Additional context used
    🧬 Code Definitions (7)
    hvantk/__init__.py (2)
    hvantk/hvantk.py (1)
    • main (23-24)
    hvantk/commands/annotate_features.py (1)
    • main (46-104)
    hvantk/commands/annotate_features.py (2)
    hvantk/utils/annotate.py (8)
    • annotate_ccr (49-52)
    • annotate_gevir (55-61)
    • annotate_rnaseq_expression (64-69)
    • annotate_ppi (72-75)
    • annotate_gnomad_af (195-213)
    • annotate_dbnsfp_scores (102-122)
    • annotate_gnomad_constraint_metrics (125-135)
    • annotate_hca (167-192)
    hvantk/hvantk.py (1)
    • main (23-24)
    hvantk/tests/test_create_gene_annotation_tables.py (1)
    hvantk/utils/make_tables.py (5)
    • create_gnomad_constraint_gene_metrics_tb (16-45)
    • create_interactome_tb (48-81)
    • create_clinvar_tb (84-120)
    • create_gevir_tb (123-152)
    • create_ensembl_gene_tb (155-218)
    hvantk/datasets/ucsc_cell_datasets.py (1)
    hvantk/utils/file_utils.py (1)
    • download_file (11-59)
    hvantk/utils/generate_training_set.py (1)
    hvantk/utils/dataset.py (2)
    • get_chd_gene_set (59-65)
    • get_clinvar_ht (19-25)
    hvantk/commands/make_annotation_tables_cli.py (2)
    hvantk/utils/make_tables.py (4)
    • create_interactome_tb (48-81)
    • create_clinvar_tb (84-120)
    • create_gevir_tb (123-152)
    • create_gnomad_constraint_gene_metrics_tb (16-45)
    hvantk/settings.py (1)
    • set_raw_data_path (13-31)
    hvantk/utils/annotate.py (1)
    hvantk/utils/dataset.py (4)
    • get_gevir_ht (76-77)
    • get_gene_expression_ht (28-56)
    • get_clinvar_ht (19-25)
    • get_hca_ht (100-101)
    🪛 Ruff (0.8.2)
    hvantk/__init__.py

    1-1: .hvantk.main imported but unused; consider removing, adding to __all__, or using a redundant alias

    (F401)

    hvantk/commands/annotate_features.py

    19-19: hvantk.utils.annotate.annotate_degs imported but unused

    Remove unused import: hvantk.utils.annotate.annotate_degs

    (F401)

    hvantk/utils/annotate.py

    139-139: Do not use mutable data structures for argument defaults

    Replace with None; initialize within function

    (B006)

    🪛 actionlint (1.7.4)
    .github/workflows/python-package.yml

    24-24: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    .github/workflows/python-package-conda.yml

    14-14: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    .github/workflows/python-app.yml

    23-23: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    🔇 Additional comments (89)
    hvantk/utils/generate_training_set.py (13)

    12-12: Streamlined Import Statement
    Consolidating the two imports from fschd.utils.data_utils into a single line improves readability and makes the dependencies clear.


    22-26: Clear Formatting for Pathogenic Labels
    The multi-line definition of PATHOGENIC_LABEL_CLINVAR now presents the labels in a neatly organized format. This enhances clarity without changing functionality.


    29-29: Concise Disease-Specific Labels
    Defining CHD_LABEL_CLINVAR in a compact, single-line format is a good formatting improvement that keeps the code clean.


    32-32: Compact Benign Labels Definition
    The reformatting of BENIGN_LABEL_CLINVAR into a single-line list improves code clarity and maintainability.


    44-46: Extraction of Consequence Information
    Mapping over clinvar_ht.info.MC to extract the second element with x.split("[|]")[1] is neat and concise. Please ensure that every element in MC always contains the delimiter [|] to avoid potential index errors.


    48-50: Filtering Synonymous Variants
    The filter that excludes rows where any element of Consequence equals "synonymous_variant" (using the negation operator ~) is both succinct and clear.


    53-72: Comprehensive TP/TN Annotation Setup
    The construction of the ts_ann_expr dictionary using hl.case() for both TP and TN site annotations is well organized. This approach clearly maps the conditions to their Boolean outputs. Adding inline comments for each condition might further enhance the understandability of the logic.


    74-74: Applying Annotation Expressions
    Annotating clinvar_ht with the TP/TN labels using **ts_ann_expr is a succinct and effective way to enrich the table with the desired fields.


    76-76: Ensuring Exclusive Site Labels
    Filtering the table with ts_ht.filter(ts_ht.is_tp_site != ts_ht.is_tn_site) efficiently guarantees that a site cannot be simultaneously labeled as both TP and TN, ensuring data consistency.


    78-83: Mapping to a Readable Label Field
    The subsequent annotation of the rf_label using a chained hl.case()—mapping TP sites to "TP" and TN sites to "TN" with an or_missing() fallback—is clear and ensures only valid labels are retained.


    87-87: Selecting Relevant Output Columns
    The final selection of only the gene and rf_label columns from the table provides a clean output for the training set. This reduction simplifies downstream consumption of the data.


    91-91: Checkpointing the Table
    Using checkpoint with the specified output directory and overwrite=True is a robust solution for persisting the table state prior to export. It ensures reproducibility and consistency in the training set creation process.


    93-93: Exporting the Final Training Set
    Exporting the annotated table to a TSV file in one concise command is straightforward and effective. It completes the data processing pipeline as expected.

    hvantk/hvantk.py (2)

    10-14: Improved CLI Decorator Formatting:
    The multi-line formatting of the @click.group decorator (lines 10–14) greatly enhances readability and improves consistency with other parts of the codebase.


    27-29: Consistent Main Guard Formatting:
    Switching to double quotes in the if __name__ == "__main__": block (line 27) aligns with the project’s stylistic guidelines and improves code uniformity.

    hvantk/utils/file_utils.py (2)

    22-26: Enhanced Error Message Clarity:
    Reformatting the ValueError for an invalid file_name into a multi‐line message improves its clarity and readability. This makes it easier to understand the constraint that the filename must be a simple name without path components.


    33-41: Clear File Size Extraction and Warning:
    Extracting the content-length header with a default value and formatting the warning message for large files (lines 33–41) are clear improvements. The logging now provides an informative message when the file size exceeds the recommended limit.

    hvantk/settings.py (2)

    5-6: Consistent String Formatting in Context Settings:
    Changing the help option names from single to double quotes (line 5) standardizes the formatting across the project. This small change helps maintain consistency with the rest of the codebase.


    52-54: Improved Error Message Formatting in Setter:
    Reformatting the error message in the set_annotation_data_path function (lines 52–54) enhances its readability. Splitting the message over multiple lines makes it easier to modify or localize in the future.

    hvantk/utils/dataset.py (7)

    16-17: Standardized File Reading in get_chd_denovo_ht:
    Using an f-string with consistent double quotes to construct the file path improves maintainability. The change is purely stylistic and does not alter functionality.


    25-26: Consistent Path Formatting in get_clinvar_ht:
    Similar to get_chd_denovo_ht, the file path in get_clinvar_ht now employs consistent f-string formatting with double quotes. This helps in keeping the code style uniform across functions.


    28-30: Enhanced Readability for get_gene_expression_ht Signature:
    Reformatting the function signature to place parameters on separate lines (lines 28–30) greatly improves readability and makes the default values more prominent.


    40-41: Clear Table Import for RNA-seq Data:
    The update in reading the RNA-seq table using an f-string with double quotes (line 40) reinforces consistency in path formatting.


    48-52: Efficient Annotation via Dictionary Comprehension:
    The use of a dictionary comprehension to annotate expression values per time point (lines 48–52) is both concise and clear. This change standardizes the approach and keeps the code maintainable.


    54-55: Clean Column Dropping and Keying:
    Dropping columns (like "mean_expr_time_point" and "mean_expr_dev_stage") and rekeying by "Gene" (line 54) is expressed clearly now. The reformatting helps in understanding the transformation at a glance.


    68-102: Uniform Formatting Across Multiple Data Loader Functions:
    The functions get_gene_ann_ht, get_ccr_ht, get_gevir_ht, get_ppi_ht, get_dbnsfp_scores_ht, get_gnomad_metrics_ht, get_gnomad_af_ht, get_deg_ht, and get_hca_ht have been updated to use consistent f-string formatting with double quotes. This standardization aids in maintaining the project’s overall style and minimizes potential confusion when managing file paths.

    hvantk/commands/make_annotation_tables_cli.py (10)

    9-14: Improved import formatting enhances readability.

    The multi-line import format follows good Python practices, making the imported components easier to read and maintain.


    16-16: Consistent string formatting with double quotes.

    This change standardizes string formatting to use double quotes, which is consistent with other code modifications in this PR.


    25-38: Improved function signature readability with multi-line formatting.

    Breaking the function signature into multiple lines improves readability, especially with many parameters. This follows Python best practices for long parameter lists.


    45-46: Consistent string formatting in checkpoint paths.

    Changed from single quotes to double quotes for string consistency throughout the codebase.

    Also applies to: 51-52, 56-56, 60-60


    63-113: Improved click decorator readability with multi-line formatting.

    Breaking the click decorators into multiple lines significantly improves readability. This is especially important for command-line interfaces with many options.


    115-128: Enhanced function signature readability with multi-line formatting.

    The multi-line format for the function signature makes the code more maintainable, especially with many parameters.


    131-142: Improved conditional check formatting.

    Breaking the condition list into multiple lines enhances readability when checking multiple flags.


    143-145: Better formatted error message.

    Multi-line formatting for the error message improves code readability.


    148-160: Enhanced function call readability with multi-line formatting.

    Breaking the function call into multiple lines significantly improves readability when passing many arguments.


    161-163: Consistent main block formatting.

    Added proper spacing around the main block for consistent code style.

    README.md (1)

    1-2: Added Conda build status badge.

    The badge provides immediate visual feedback on the build status of the Python package using Conda, improving project transparency. This integrates well with the new CI workflows.

    setup.py (3)

    16-16: Simplified dependencies list format.

    The single-line format for dependencies is more concise for a small number of requirements.


    17-20: Consistent string formatting with double quotes.

    The change from single quotes to double quotes for entry points maintains consistency with the rest of the codebase.


    28-29: Consistent string formatting for Python version requirement.

    Using double quotes for the Python version requirement maintains consistency with other string literals in the file.

    .github/workflows/python-package.yml (2)

    1-41: Well-structured CI workflow for Python package testing.

    This new workflow effectively automates testing across multiple Python versions (3.10, 3.11, 3.12), which helps ensure compatibility. The configuration includes proper dependency installation, linting with flake8, and testing with pytest.

    🧰 Tools
    🪛 actionlint (1.7.4)

    24-24: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)


    23-26:

    ✅ Verification successful

    Update the actions/setup-python action version.

    The current version (v3) of the setup-python action is outdated.

    - uses: actions/setup-python@v3
    + uses: actions/setup-python@v4

    🌐 Web query:

    What is the latest version of the GitHub Actions setup-python action?
    

    💡 Result:

    The latest version of the GitHub Actions setup-python action is v5, as shown in official GitHub documentation examples from March 2025[3]. This version includes support for:

    • Python versions up to 3.13
    • PyPy versions like pypy3.9
    • Matrix configurations for cross-version testing
    • Semantic versioning syntax (e.g., 3.x)
    • Architecture selection (x64/x86)

    Example usage from GitHub's recommended configuration:

    - uses: actions/setup-python@v5
      with:
        python-version: '3.x'
        architecture: 'x64'

    The documentation specifically states this version is the recommended way to ensure consistent behavior across GitHub-hosted runners[3]. While some older examples show v4 usage[4], the official GitHub Docs have been updated to demonstrate v5 as the current standard.

    Citations:


    Update GitHub Actions setup-python to v5

    The current workflow uses actions/setup-python@v3, which is now outdated. According to official documentation (March 2025), the latest version is v5. Please update the workflow as follows:

    • File: .github/workflows/python-package.yml (Lines 23-26)
    • Change:
      - uses: actions/setup-python@v3
      + uses: actions/setup-python@v5

    This update ensures you benefit from the latest features (such as enhanced Python and PyPy support, improved matrix configurations, and architecture selection) and aligns with current best practices.

    🧰 Tools
    🪛 actionlint (1.7.4)

    24-24: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    hvantk/utils/constants.py (3)

    7-11: Added pathlib import and BASE_DIR constant.

    Using pathlib.Path is a modern approach to handle file paths in Python, making the code more robust across different operating systems. The BASE_DIR constant provides a reliable reference point for relative paths.


    14-23: Consistent string formatting with double quotes.

    The change from single quotes to double quotes for dictionary keys and values maintains consistency with the codebase style.


    31-32: Added standardized path to UCSC datasets JSON file.

    Using the BASE_DIR constant to define UCSC_JSON_FILE_PATH provides a reliable, cross-platform way to reference resources. This approach is more maintainable than hardcoded paths.

    hvantk/tests/test_downloader.py (4)

    7-11: Improved import formatting for better readability.

    The imports have been refactored to a more structured multi-line format, which enhances readability and makes future additions easier to manage.


    14-16: Clean URL construction formatting.

    The URL construction has been reformatted to a multi-line style, improving readability while maintaining the same functionality.


    32-36: Consistent string literal style.

    String literals have been standardized to use double quotes instead of single quotes, improving consistency across the codebase.


    45-46: Consistent file open mode formatting.

    The file opening mode has been updated to use double quotes, maintaining consistency with the rest of the codebase.

    environment.yml (3)

    1-1: Environment name simplified for better consistency.

    The environment name has been changed from hvantk-env to simply hvantk, which aligns better with the package name.


    5-5: More flexible Python version requirement.

    The Python version requirement has been changed from a strict ==3.10.0 to a more flexible >=3.10.0, allowing compatibility with newer Python versions.


    9-10:

    ❓ Verification inconclusive

    Removed setuptools version constraint.

    The version constraint on setuptools has been removed. While this allows for flexibility, it's worth considering whether a minimum version constraint would be appropriate to ensure compatibility.


    🌐 Web query:

    What is the latest version of setuptools and are there any breaking changes in recent versions?
    

    💡 Result:

    The latest version of setuptools is 78.1.0, released on March 25, 2025[9]. Recent versions have introduced several breaking changes and notable updates:

    Key Breaking Changes

    1. v78.x Series (2025)

    • Strict setup.cfg validation: Enforces underscore-separated keys instead of dash-separated formats (e.g., description_file instead of description-file)[6][2]
    • Introduced unexpected build failures for packages with non-compliant dependencies, creating cascading compatibility issues[2][6]
    • Reverted partial removals in v78.0.1 after widespread breakage[6]

    2. v72.x Series (2024)

    • Removed setup.py test command (later partially reverted in v72.1)[5]
    • Broke installations relying on test command execution during build[5]

    3. v71.0 (2024)

    • Changed dependency declaration to use core extra
    • Vendored dependencies now prefer installed versions over bundled ones[1]

    4. v70.0 (2024)

    • Removed deprecated test command permanently
    • Enforced UTF-8 adoption in build artifacts with potential edge case incompatibilities[1]

    Recent Critical Issues

    • The v78 release initially broke setuptools' own tests due to non-compliant dependencies like requests, leading to rushed fixes[2][6]
    • Multiple packages (including popular ones like Ansible-Vault and PySpark) suffered installation failures due to strict naming rules[6]

    Developers are advised to:

    • Pin setuptools versions (<78.0 for dash-separated config users)[6]
    • Migrate to modern packaging practices using pyproject.toml and wheel formats[5][11]
    • Monitor dependencies for strict naming compliance[2]

    This pattern highlights the importance of dependency version pinning and proactive response to long-standing deprecation warnings[2][6].

    Citations:


    Setuptools Dependency Constraint: Verify Compatibility

    The removal of the setuptools version constraint in environment.yml (lines 9–10) increases flexibility—but it also opens the door to potential issues. The latest setuptools release (v78.1.0) introduces several breaking changes (like stricter configuration key validations) that could affect projects relying on legacy behaviors (for instance, dash-separated keys). Please review whether our project depends on any of these behaviors and consider pinning setuptools (e.g., using <78.0) if compatibility issues arise.

    • Actionable Check: Confirm that downstream processes and configurations can handle the changes in setuptools v78.1.0, or update the constraint accordingly.
    .github/workflows/python-app.yml (2)

    1-14: Well-structured GitHub Actions workflow configuration.

    The workflow is properly configured to run on push and pull requests to the main branch with appropriate read permissions.


    26-39: Comprehensive CI steps for dependency installation, linting, and testing.

    The workflow includes all the essential steps for a CI pipeline: installing dependencies, running linting with proper configurations, and executing tests.

    hvantk/commands/annotate_features.py (5)

    10-23: Improved import formatting for better readability.

    The imports have been refactored to a more structured multi-line format with consistent indentation, which enhances readability and makes future modifications easier.

    🧰 Tools
    🪛 Ruff (0.8.2)

    19-19: hvantk.utils.annotate.annotate_degs imported but unused

    Remove unused import: hvantk.utils.annotate.annotate_degs

    (F401)


    27-43: Consistent string formatting and function signature styling.

    String literals have been standardized to use double quotes, and function signatures have been reformatted for consistency.


    48-57: Consistent function call formatting.

    Function calls have been reformatted to follow a consistent style throughout the file.


    74-101: Consistent formatting for annotation function calls.

    All the annotation function calls have been reformatted for consistency, using a clean style that makes the code more readable while maintaining the same functionality.


    111-139: Improved argument parser formatting.

    The argument parser configuration has been reformatted with consistent indentation and multi-line style, which improves readability while maintaining the same functionality.

    hvantk/tests/test_UCSCDataSetCollection.py (3)

    4-4: Added import for constant file path reference.

    Using a constant instead of hardcoded file paths is a good practice that improves maintainability.


    26-40: Improved JSON structure with proper trailing commas.

    Adding trailing commas to the JSON structure elements is a good practice as it makes future additions easier and produces cleaner diffs when changes are made.


    62-64: Using constant path instead of hardcoded string.

    Replaced the hardcoded file path with the imported constant, which improves maintainability and consistency.

    .github/workflows/python-package-conda.yml (2)

    1-35: LGTM! The workflow is well-structured.

    The workflow is organized with a clear sequence of steps: checkout, Python setup, dependency installation, linting, and testing.

    🧰 Tools
    🪛 actionlint (1.7.4)

    14-14: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)


    14-16:

    ✅ Verification successful

    Update the actions/setup-python action version.

    The GitHub Action actions/setup-python@v3 is outdated. GitHub regularly deprecates older versions of actions, which can lead to compatibility issues in the future.

    - uses: actions/setup-python@v3
    + uses: actions/setup-python@v5

    To verify the latest version:


    🌐 Web query:

    What is the latest version of GitHub's actions/setup-python action?
    

    💡 Result:

    The latest version of GitHub's actions/setup-python action is v5.5.0, released on October 24, 2024[8]. This version introduced several enhancements, including support for free-threaded Python versions (e.g., 3.13t), architecture-specific PATH management improvements, and compatibility with .tool-versions files[8].

    Key updates in v5.5.0 include:

    • Free-threaded Python support: Enables using special syntax like 3.13t for thread-optimized builds[8]
    • ARM64 runner improvements: Added Ubuntu ARM runner support and fixed PyPy architecture mapping on Linux ARM64[8]
    • Dependency caching: Upgraded @action/cache to v4.0.3 for improved package management[8]

    The action continues to support Python, PyPy, and GraalPy installations across Linux, macOS, and Windows environments[2][4]. For projects requiring Python 2.7, the LizardByte/setup-python-action fork remains a recommended alternative[1].

    When using this action, remember to:

    • Explicitly specify Python versions for consistency
    • Leverage built-in caching with cache: pip for dependency management[3][6]
    • Check actions/python-versions for pre-installed interpreter availability[13]

    Citations:


    Action Required: Update GitHub Action to Latest Version

    The web query confirms that the latest release of the actions/setup-python action is v5.5.0. Please update the workflow file accordingly. It’s recommended to pin the version to ensure reproducibility.

    • File: .github/workflows/python-package-conda.yml (lines 14-16)
    • Proposed diff:
    - uses: actions/setup-python@v3
    + uses: actions/setup-python@v5.5.0
    🧰 Tools
    🪛 actionlint (1.7.4)

    14-14: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

    (action)

    hvantk/tests/test_create_gene_annotation_tables.py (3)

    12-16: Good organization of test directories.

    Creating separate constants for test data and temporary directories improves clarity and follows good practices for test organization.


    47-49: Good practice: cleaning up temporary directories.

    Cleaning up temporary directories after tests is a good practice to prevent disk space issues during repeated test runs.


    32-36: LGTM! Properly converting Path objects to strings.

    Converting the Path objects to strings using str() before passing them to the functions ensures compatibility with the expected input types.

    hvantk/datasets/ucsc_cell_datasets.py (4)

    5-9: LGTM! Improved import formatting.

    The multi-line import format with trailing commas improves readability and makes future additions easier.


    27-53: LGTM! Well-structured docstring.

    The docstring formatting follows good practices with clear descriptions of the class and its attributes.


    107-128: LGTM! Improved error message formatting.

    The error message in the exception handling is properly formatted with f-strings.


    178-209: LGTM! Well-structured error handling in from_json method.

    The error handling in the from_json method is comprehensive, checking for required keys, validating the type of datasets, and handling various potential exceptions.

    hvantk/utils/make_tables.py (5)

    16-22: LGTM! Consistent function parameter formatting.

    The function parameters are now consistently formatted with two-space indentation, improving readability.


    33-35: LGTM! Clean formatting of function calls.

    The hl.import_table call is now formatted more compactly, improving readability.


    66-74: LGTM! Well-structured method chaining.

    The method chaining in create_interactome_tb is now more readable with proper indentation.


    101-113: LGTM! Improved formatting for complex operations.

    The dictionary comprehension and method chaining in create_clinvar_tb are now more consistently formatted.


    189-208: LGTM! Well-formatted complex aggregation.

    The complex aggregation in create_ensembl_gene_tb now follows a consistent indentation pattern, making the structure clearer.

    hvantk/utils/annotate.py (13)

    6-18: Improved Import Formatting.
    The reformatting of the import section clearly separates each dataset utility function, which enhances readability and maintainability.


    24-30: Consistent ClinVar Label Formatting.
    The benign and pathogenic ClinVar label lists have been reformatted into concise, single-line representations. This change improves clarity without altering the data.


    32-37: Clear Pre-calculation of ClinVar Conditions.
    The computation of is_pathogenic and is_benign is well laid out to improve readability and performance by pre-calculating these conditions before annotation.


    38-44: Structured ClinVar Annotation with hl.case().
    The chained use of hl.case(), along with the clear handling of pathogenic and benign conditions, makes the logic in annotating ClinVar data both readable and robust.


    55-62: Refined Signature and Body in annotate_gevir.
    The function signature has been adjusted (removing an extraneous trailing comma) and the body remains clear with a concise application of get_gevir_ht().


    64-66: Consistent Default Value Formatting in annotate_rnaseq_expression.
    Changing the default value for organ to use double quotes improves consistency with the rest of the codebase.


    90-95: Enhanced Transformation Pipeline in annotate_ensembl_gene.
    The revamped method chain for processing gene_ht—from transmutation through explosion, re-keying, and selective field projection—improves readability and emphasizes each transformation step.


    113-117: Efficient Scoring Field Selection in annotate_dbnsfp_scores.
    The dynamic filtering of fields (selecting those ending with _score or matching CADD_phred) makes the code more robust and adaptable to the structure of the dataset.


    155-162: Clear Differential Expression Annotation.
    The use of a dictionary comprehension within t.transmute() to generate cluster-specific annotations is effective, concise, and easy to follow.


    167-178: Enhanced Function Signature in annotate_hca.
    Aligning the parameters vertically—including the immutable tuple for cell_categories—improves clarity and makes the function signature easier to read.


    188-190: Streamlined HCA Data Annotation.
    The refactored retrieval and annotation of HCA data through hl.struct() is both clear and consistent, ensuring that the selected cell categories are correctly applied.


    210-212: Robust gnomAD AF Annotation Handling.
    By using hl.if_else to check for the definition of the allele frequency expression and defaulting to a float-zero when absent, the implementation safeguards against missing data.


    227-235: Consistent Variant ID Construction.
    Using hl.delimit to combine the contig, position, and allele values into a standardized variant ID string is clear and adheres to expected formatting conventions.

    @ypriverol ypriverol merged commit 26672c3 into main Mar 31, 2025
    6 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants