-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tests and CI workflows #8
Conversation
WalkthroughThis pull request introduces three new GitHub Actions workflows to automate installing dependencies, linting, and running tests for Python applications under various configurations (standard, Conda-based, and multi-version). It updates environment settings, adjusts file tracking in Git, and makes numerous formatting improvements across source, test, and utility files for consistency. These changes do not modify functional behavior but streamline code readability and enhance CI process integration. Changes
Sequence Diagram(s)sequenceDiagram
actor Dev as Developer
participant GH as GitHub
participant CI as Workflow
Dev->>GH: Push or create PR on "main"
GH->>CI: Trigger corresponding CI workflow(s)
CI->>CI: Checkout repository (actions/checkout@v4)
CI->>CI: Setup Python environment (actions/setup-python@v3)
CI->>CI: Install dependencies (pip/Conda install, requirements/environment.yml)
CI->>CI: Run linting (flake8) and tests (pytest)
CI-->>GH: Report build status
Suggested labels
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (5)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds tests, updates CI workflows, and makes various formatting improvements for consistency across the codebase. Key changes include:
- Refining code formatting (use of double quotes and consistent indentation) in annotation and CLI modules.
- Updating tests to use temporary directories and improving file path handling.
- Adding and updating GitHub Actions workflows and environment configurations.
Reviewed Changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
hvantk/utils/annotate.py | Formatting improvements and minor list style updates. |
hvantk/tests/* | Updates to use temporary directories and consistent quoting. |
hvantk/settings.py | Minor reformatting and updated string literals. |
hvantk/hvantk.py & hvantk/datasets/ucsc_cell_datasets.py | Code style consistency updates. |
hvantk/commands/make_annotation_tables_cli.py | Refactored CLI option formatting and option help text updates. |
hvantk/commands/annotate_features.py | Formatting updates and improved parameter usage. |
environment.yml, README.md, .github/workflows/* | Added/updated CI workflows and environment configuration. |
Comments suppressed due to low confidence (1)
hvantk/commands/make_annotation_tables_cli.py:78
- The help text for the '--interactome' option appears incorrect as it references a CCR table. Consider updating it to reflect that it creates/updates the interactome table instead.
@click.option("--interactome", is_flag=True, help="Create/update CCR table from source.")
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
hvantk/utils/generate_training_set.py (1)
41-41
: Gene Annotation Simplification
Annotating theclinvar_ht
table with a newgene
field in one line is concise and clear. Consider verifying that usingsplit("[:]")
is intentional. If the goal is to split by a simple colon, usingsplit(":")
might be more straightforward.hvantk/__init__.py (1)
1-2
: Clarify Public API formain
:
The file importsmain
from.hvantk
but does not use it locally. If the intent is to re‐export it as part of the package’s public API, please consider adding an__all__
declaration (e.g.__all__ = ["main"]
) to make this explicit. Otherwise, if it’s not needed here, removing the unused import is also an option.🧰 Tools
🪛 Ruff (0.8.2)
1-1:
.hvantk.main
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
hvantk/utils/dataset.py (1)
44-45
: Potential Robustness Improvement for Time Point Extraction:
The expressiontps = t[tp_col].key_set().filter(lambda x: x.organ == organ).time_point.collect()[0](around line 44) assumes that the filtered collection is non-empty. Consider adding error handling or a conditional check to guard against an empty list, which would otherwise raise an
IndexError
..github/workflows/python-app.yml (1)
21-25
: Update the GitHub Actions setup-python version.The setup-python action is using an older version (v3) which may be outdated.
- - uses: actions/setup-python@v3 + - uses: actions/setup-python@v4🧰 Tools
🪛 actionlint (1.7.4)
23-23: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
hvantk/commands/annotate_features.py (1)
19-19
: Remove unused import.The
annotate_degs
function is imported but not used in the code (it appears to be commented out in lines 89-90).- annotate_degs,
🧰 Tools
🪛 Ruff (0.8.2)
19-19:
hvantk.utils.annotate.annotate_degs
imported but unusedRemove unused import:
hvantk.utils.annotate.annotate_degs
(F401)
.github/workflows/python-package-conda.yml (2)
23-23
: LGTM! Consider adding environment activation validation.The command to update the base Conda environment looks good.
Consider adding a step to verify the environment was updated correctly by printing the installed packages:
run: | conda env update --file environment.yml --name base + # Verify environment was updated correctly + conda list
26-30
: Consider adding a flake8 configuration file.The flake8 configuration is currently hardcoded in the workflow. For better maintainability, consider moving this configuration to a
.flake8
file in the repository.This would allow developers to use the same configuration locally and make it easier to update in the future.
hvantk/tests/test_create_gene_annotation_tables.py (1)
18-166
: Consider creating a setup/teardown fixture for test directory.All test functions have similar cleanup code for removing the temporary directory. This could be extracted into a fixture using pytest's
@pytest.fixture
to avoid repetition.import pytest @pytest.fixture(autouse=True) def setup_teardown(): """Create and clean up the temporary directory for each test.""" # Setup - This runs before each test TMP_DIR.mkdir(exist_ok=True, parents=True) # The test runs here yield # Teardown - This runs after each test if TMP_DIR.exists(): shutil.rmtree(TMP_DIR)This fixture could be placed at the top of the file and with
autouse=True
, it would automatically run for each test case without explicitly calling it.hvantk/datasets/ucsc_cell_datasets.py (1)
118-124
: Consider adding a timeout parameter.When downloading files, it's a good practice to set an explicit timeout.
Looking at the implementation of
download_file()
in the relevant snippet, it already includes a timeout parameter. You might consider adding a parameter to allow customizing this timeout if necessary for different network conditions.def download_expression_matrix(self, out_dir: str, timeout: int = 30) -> str: """ ... Args: out_dir: Directory where the file will be saved + timeout: Connection timeout in seconds ... """ url_download = ( f"{UCSC_CELL_BROWSER_BASE_URL}/{self.name}/{EXPRESSION_MATRIX_FILE_NAME}" ) try: return download_file( url=url_download, out_dir=out_dir, file_name=EXPRESSION_MATRIX_FILE_NAME, timeout=timeout )
hvantk/utils/annotate.py (1)
138-140
: Avoid Mutable Default Argument inannotate_degs
.
Using a mutable list as a default forclusters
can lead to unexpected behavior. It’s recommended to set the default toNone
and initialize within the function.-def annotate_degs(t: hl.Table, gene_symbol_col: str, clusters: list = ["C0", "C5", "C7", "C10", "C14"]) -> hl.Table: +def annotate_degs(t: hl.Table, gene_symbol_col: str, clusters: list = None) -> hl.Table: if clusters is None: - clusters = ["C0", "C5", "C7", "C10", "C14"] + clusters = ["C0", "C5", "C7", "C10", "C14"]🧰 Tools
🪛 Ruff (0.8.2)
139-139: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
.github/workflows/python-app.yml
(1 hunks).github/workflows/python-package-conda.yml
(1 hunks).github/workflows/python-package.yml
(1 hunks).gitignore
(0 hunks)README.md
(1 hunks)environment.yml
(1 hunks)hvantk/__init__.py
(1 hunks)hvantk/commands/annotate_features.py
(2 hunks)hvantk/commands/make_annotation_tables_cli.py
(2 hunks)hvantk/datasets/ucsc_cell_datasets.py
(6 hunks)hvantk/hvantk.py
(1 hunks)hvantk/settings.py
(2 hunks)hvantk/tests/test_UCSCDataSetCollection.py
(4 hunks)hvantk/tests/test_create_gene_annotation_tables.py
(6 hunks)hvantk/tests/test_downloader.py
(3 hunks)hvantk/utils/annotate.py
(11 hunks)hvantk/utils/constants.py
(1 hunks)hvantk/utils/dataset.py
(4 hunks)hvantk/utils/file_utils.py
(2 hunks)hvantk/utils/generate_training_set.py
(3 hunks)hvantk/utils/make_tables.py
(8 hunks)setup.py
(2 hunks)
💤 Files with no reviewable changes (1)
- .gitignore
🧰 Additional context used
🧬 Code Definitions (7)
hvantk/__init__.py (2)
hvantk/hvantk.py (1)
main
(23-24)hvantk/commands/annotate_features.py (1)
main
(46-104)
hvantk/commands/annotate_features.py (2)
hvantk/utils/annotate.py (8)
annotate_ccr
(49-52)annotate_gevir
(55-61)annotate_rnaseq_expression
(64-69)annotate_ppi
(72-75)annotate_gnomad_af
(195-213)annotate_dbnsfp_scores
(102-122)annotate_gnomad_constraint_metrics
(125-135)annotate_hca
(167-192)hvantk/hvantk.py (1)
main
(23-24)
hvantk/tests/test_create_gene_annotation_tables.py (1)
hvantk/utils/make_tables.py (5)
create_gnomad_constraint_gene_metrics_tb
(16-45)create_interactome_tb
(48-81)create_clinvar_tb
(84-120)create_gevir_tb
(123-152)create_ensembl_gene_tb
(155-218)
hvantk/datasets/ucsc_cell_datasets.py (1)
hvantk/utils/file_utils.py (1)
download_file
(11-59)
hvantk/utils/generate_training_set.py (1)
hvantk/utils/dataset.py (2)
get_chd_gene_set
(59-65)get_clinvar_ht
(19-25)
hvantk/commands/make_annotation_tables_cli.py (2)
hvantk/utils/make_tables.py (4)
create_interactome_tb
(48-81)create_clinvar_tb
(84-120)create_gevir_tb
(123-152)create_gnomad_constraint_gene_metrics_tb
(16-45)hvantk/settings.py (1)
set_raw_data_path
(13-31)
hvantk/utils/annotate.py (1)
hvantk/utils/dataset.py (4)
get_gevir_ht
(76-77)get_gene_expression_ht
(28-56)get_clinvar_ht
(19-25)get_hca_ht
(100-101)
🪛 Ruff (0.8.2)
hvantk/__init__.py
1-1: .hvantk.main
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
hvantk/commands/annotate_features.py
19-19: hvantk.utils.annotate.annotate_degs
imported but unused
Remove unused import: hvantk.utils.annotate.annotate_degs
(F401)
hvantk/utils/annotate.py
139-139: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🪛 actionlint (1.7.4)
.github/workflows/python-package.yml
24-24: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
.github/workflows/python-package-conda.yml
14-14: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
.github/workflows/python-app.yml
23-23: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🔇 Additional comments (89)
hvantk/utils/generate_training_set.py (13)
12-12
: Streamlined Import Statement
Consolidating the two imports fromfschd.utils.data_utils
into a single line improves readability and makes the dependencies clear.
22-26
: Clear Formatting for Pathogenic Labels
The multi-line definition ofPATHOGENIC_LABEL_CLINVAR
now presents the labels in a neatly organized format. This enhances clarity without changing functionality.
29-29
: Concise Disease-Specific Labels
DefiningCHD_LABEL_CLINVAR
in a compact, single-line format is a good formatting improvement that keeps the code clean.
32-32
: Compact Benign Labels Definition
The reformatting ofBENIGN_LABEL_CLINVAR
into a single-line list improves code clarity and maintainability.
44-46
: Extraction of Consequence Information
Mapping overclinvar_ht.info.MC
to extract the second element withx.split("[|]")[1]
is neat and concise. Please ensure that every element inMC
always contains the delimiter[|]
to avoid potential index errors.
48-50
: Filtering Synonymous Variants
The filter that excludes rows where any element ofConsequence
equals"synonymous_variant"
(using the negation operator~
) is both succinct and clear.
53-72
: Comprehensive TP/TN Annotation Setup
The construction of thets_ann_expr
dictionary usinghl.case()
for both TP and TN site annotations is well organized. This approach clearly maps the conditions to their Boolean outputs. Adding inline comments for each condition might further enhance the understandability of the logic.
74-74
: Applying Annotation Expressions
Annotatingclinvar_ht
with the TP/TN labels using**ts_ann_expr
is a succinct and effective way to enrich the table with the desired fields.
76-76
: Ensuring Exclusive Site Labels
Filtering the table withts_ht.filter(ts_ht.is_tp_site != ts_ht.is_tn_site)
efficiently guarantees that a site cannot be simultaneously labeled as both TP and TN, ensuring data consistency.
78-83
: Mapping to a Readable Label Field
The subsequent annotation of therf_label
using a chainedhl.case()
—mapping TP sites to"TP"
and TN sites to"TN"
with anor_missing()
fallback—is clear and ensures only valid labels are retained.
87-87
: Selecting Relevant Output Columns
The final selection of only thegene
andrf_label
columns from the table provides a clean output for the training set. This reduction simplifies downstream consumption of the data.
91-91
: Checkpointing the Table
Usingcheckpoint
with the specified output directory andoverwrite=True
is a robust solution for persisting the table state prior to export. It ensures reproducibility and consistency in the training set creation process.
93-93
: Exporting the Final Training Set
Exporting the annotated table to a TSV file in one concise command is straightforward and effective. It completes the data processing pipeline as expected.hvantk/hvantk.py (2)
10-14
: Improved CLI Decorator Formatting:
The multi-line formatting of the@click.group
decorator (lines 10–14) greatly enhances readability and improves consistency with other parts of the codebase.
27-29
: Consistent Main Guard Formatting:
Switching to double quotes in theif __name__ == "__main__":
block (line 27) aligns with the project’s stylistic guidelines and improves code uniformity.hvantk/utils/file_utils.py (2)
22-26
: Enhanced Error Message Clarity:
Reformatting theValueError
for an invalidfile_name
into a multi‐line message improves its clarity and readability. This makes it easier to understand the constraint that the filename must be a simple name without path components.
33-41
: Clear File Size Extraction and Warning:
Extracting thecontent-length
header with a default value and formatting the warning message for large files (lines 33–41) are clear improvements. The logging now provides an informative message when the file size exceeds the recommended limit.hvantk/settings.py (2)
5-6
: Consistent String Formatting in Context Settings:
Changing the help option names from single to double quotes (line 5) standardizes the formatting across the project. This small change helps maintain consistency with the rest of the codebase.
52-54
: Improved Error Message Formatting in Setter:
Reformatting the error message in theset_annotation_data_path
function (lines 52–54) enhances its readability. Splitting the message over multiple lines makes it easier to modify or localize in the future.hvantk/utils/dataset.py (7)
16-17
: Standardized File Reading inget_chd_denovo_ht
:
Using an f-string with consistent double quotes to construct the file path improves maintainability. The change is purely stylistic and does not alter functionality.
25-26
: Consistent Path Formatting inget_clinvar_ht
:
Similar toget_chd_denovo_ht
, the file path inget_clinvar_ht
now employs consistent f-string formatting with double quotes. This helps in keeping the code style uniform across functions.
28-30
: Enhanced Readability forget_gene_expression_ht
Signature:
Reformatting the function signature to place parameters on separate lines (lines 28–30) greatly improves readability and makes the default values more prominent.
40-41
: Clear Table Import for RNA-seq Data:
The update in reading the RNA-seq table using an f-string with double quotes (line 40) reinforces consistency in path formatting.
48-52
: Efficient Annotation via Dictionary Comprehension:
The use of a dictionary comprehension to annotate expression values per time point (lines 48–52) is both concise and clear. This change standardizes the approach and keeps the code maintainable.
54-55
: Clean Column Dropping and Keying:
Dropping columns (like"mean_expr_time_point"
and"mean_expr_dev_stage"
) and rekeying by"Gene"
(line 54) is expressed clearly now. The reformatting helps in understanding the transformation at a glance.
68-102
: Uniform Formatting Across Multiple Data Loader Functions:
The functionsget_gene_ann_ht
,get_ccr_ht
,get_gevir_ht
,get_ppi_ht
,get_dbnsfp_scores_ht
,get_gnomad_metrics_ht
,get_gnomad_af_ht
,get_deg_ht
, andget_hca_ht
have been updated to use consistent f-string formatting with double quotes. This standardization aids in maintaining the project’s overall style and minimizes potential confusion when managing file paths.hvantk/commands/make_annotation_tables_cli.py (10)
9-14
: Improved import formatting enhances readability.The multi-line import format follows good Python practices, making the imported components easier to read and maintain.
16-16
: Consistent string formatting with double quotes.This change standardizes string formatting to use double quotes, which is consistent with other code modifications in this PR.
25-38
: Improved function signature readability with multi-line formatting.Breaking the function signature into multiple lines improves readability, especially with many parameters. This follows Python best practices for long parameter lists.
45-46
: Consistent string formatting in checkpoint paths.Changed from single quotes to double quotes for string consistency throughout the codebase.
Also applies to: 51-52, 56-56, 60-60
63-113
: Improved click decorator readability with multi-line formatting.Breaking the click decorators into multiple lines significantly improves readability. This is especially important for command-line interfaces with many options.
115-128
: Enhanced function signature readability with multi-line formatting.The multi-line format for the function signature makes the code more maintainable, especially with many parameters.
131-142
: Improved conditional check formatting.Breaking the condition list into multiple lines enhances readability when checking multiple flags.
143-145
: Better formatted error message.Multi-line formatting for the error message improves code readability.
148-160
: Enhanced function call readability with multi-line formatting.Breaking the function call into multiple lines significantly improves readability when passing many arguments.
161-163
: Consistent main block formatting.Added proper spacing around the main block for consistent code style.
README.md (1)
1-2
: Added Conda build status badge.The badge provides immediate visual feedback on the build status of the Python package using Conda, improving project transparency. This integrates well with the new CI workflows.
setup.py (3)
16-16
: Simplified dependencies list format.The single-line format for dependencies is more concise for a small number of requirements.
17-20
: Consistent string formatting with double quotes.The change from single quotes to double quotes for entry points maintains consistency with the rest of the codebase.
28-29
: Consistent string formatting for Python version requirement.Using double quotes for the Python version requirement maintains consistency with other string literals in the file.
.github/workflows/python-package.yml (2)
1-41
: Well-structured CI workflow for Python package testing.This new workflow effectively automates testing across multiple Python versions (3.10, 3.11, 3.12), which helps ensure compatibility. The configuration includes proper dependency installation, linting with flake8, and testing with pytest.
🧰 Tools
🪛 actionlint (1.7.4)
24-24: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
23-26
:✅ Verification successful
Update the actions/setup-python action version.
The current version (v3) of the setup-python action is outdated.
- uses: actions/setup-python@v3 + uses: actions/setup-python@v4
🌐 Web query:
What is the latest version of the GitHub Actions setup-python action?
💡 Result:
The latest version of the GitHub Actions
setup-python
action is v5, as shown in official GitHub documentation examples from March 2025[3]. This version includes support for:
- Python versions up to 3.13
- PyPy versions like pypy3.9
- Matrix configurations for cross-version testing
- Semantic versioning syntax (e.g.,
3.x
)- Architecture selection (x64/x86)
Example usage from GitHub's recommended configuration:
- uses: actions/setup-python@v5 with: python-version: '3.x' architecture: 'x64'The documentation specifically states this version is the recommended way to ensure consistent behavior across GitHub-hosted runners[3]. While some older examples show v4 usage[4], the official GitHub Docs have been updated to demonstrate v5 as the current standard.
Citations:
- 1: https://github.com/LizardByte/setup-python-action
- 2: https://github.blog/changelog/2021-11-23-github-actions-setup-python-now-supports-dependency-caching/
- 3: https://docs.github.com/en/actions/use-cases-and-examples/building-and-testing/building-and-testing-python
- 4: https://til.simonwillison.net/github-actions/cache-setup-py
- 5: https://github.com/actions/setup-python
- 6: https://github.com/actions/python-versions
Update GitHub Actions setup-python to v5
The current workflow uses
actions/setup-python@v3
, which is now outdated. According to official documentation (March 2025), the latest version is v5. Please update the workflow as follows:
- File:
.github/workflows/python-package.yml
(Lines 23-26)- Change:
- uses: actions/setup-python@v3 + uses: actions/setup-python@v5This update ensures you benefit from the latest features (such as enhanced Python and PyPy support, improved matrix configurations, and architecture selection) and aligns with current best practices.
🧰 Tools
🪛 actionlint (1.7.4)
24-24: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
hvantk/utils/constants.py (3)
7-11
: Added pathlib import and BASE_DIR constant.Using
pathlib.Path
is a modern approach to handle file paths in Python, making the code more robust across different operating systems. TheBASE_DIR
constant provides a reliable reference point for relative paths.
14-23
: Consistent string formatting with double quotes.The change from single quotes to double quotes for dictionary keys and values maintains consistency with the codebase style.
31-32
: Added standardized path to UCSC datasets JSON file.Using the
BASE_DIR
constant to defineUCSC_JSON_FILE_PATH
provides a reliable, cross-platform way to reference resources. This approach is more maintainable than hardcoded paths.hvantk/tests/test_downloader.py (4)
7-11
: Improved import formatting for better readability.The imports have been refactored to a more structured multi-line format, which enhances readability and makes future additions easier to manage.
14-16
: Clean URL construction formatting.The URL construction has been reformatted to a multi-line style, improving readability while maintaining the same functionality.
32-36
: Consistent string literal style.String literals have been standardized to use double quotes instead of single quotes, improving consistency across the codebase.
45-46
: Consistent file open mode formatting.The file opening mode has been updated to use double quotes, maintaining consistency with the rest of the codebase.
environment.yml (3)
1-1
: Environment name simplified for better consistency.The environment name has been changed from
hvantk-env
to simplyhvantk
, which aligns better with the package name.
5-5
: More flexible Python version requirement.The Python version requirement has been changed from a strict
==3.10.0
to a more flexible>=3.10.0
, allowing compatibility with newer Python versions.
9-10
:❓ Verification inconclusive
Removed setuptools version constraint.
The version constraint on setuptools has been removed. While this allows for flexibility, it's worth considering whether a minimum version constraint would be appropriate to ensure compatibility.
🌐 Web query:
What is the latest version of setuptools and are there any breaking changes in recent versions?
💡 Result:
The latest version of setuptools is 78.1.0, released on March 25, 2025[9]. Recent versions have introduced several breaking changes and notable updates:
Key Breaking Changes
1. v78.x Series (2025)
- Strict setup.cfg validation: Enforces underscore-separated keys instead of dash-separated formats (e.g.,
description_file
instead ofdescription-file
)[6][2]- Introduced unexpected build failures for packages with non-compliant dependencies, creating cascading compatibility issues[2][6]
- Reverted partial removals in v78.0.1 after widespread breakage[6]
2. v72.x Series (2024)
- Removed
setup.py test
command (later partially reverted in v72.1)[5]- Broke installations relying on test command execution during build[5]
3. v71.0 (2024)
- Changed dependency declaration to use
core
extra- Vendored dependencies now prefer installed versions over bundled ones[1]
4. v70.0 (2024)
- Removed deprecated
test
command permanently- Enforced UTF-8 adoption in build artifacts with potential edge case incompatibilities[1]
Recent Critical Issues
- The v78 release initially broke setuptools' own tests due to non-compliant dependencies like
requests
, leading to rushed fixes[2][6]- Multiple packages (including popular ones like Ansible-Vault and PySpark) suffered installation failures due to strict naming rules[6]
Developers are advised to:
- Pin setuptools versions (
<78.0
for dash-separated config users)[6]- Migrate to modern packaging practices using
pyproject.toml
and wheel formats[5][11]- Monitor dependencies for strict naming compliance[2]
This pattern highlights the importance of dependency version pinning and proactive response to long-standing deprecation warnings[2][6].
Citations:
- 1: https://setuptools.pypa.io/en/stable/history.html
- 2: https://staturdays.com/2025/03/25/how-one-setuptools-release-broke-everything-and-what-we-can-learn-from-it/
- 3: https://pypi.org/project/setuptools/5.8/
- 4: https://pypi.org/project/setuptools/10.0/
- 5: https://www.bitecode.dev/p/whats-up-python-setuptools-breaks
- 6: https://pydevtools.com/blog/setuptools-78-0-1-breaking-package-installation/
- 7: http://packages.altlinux.org/en/sisyphus/srpms/python3-module-setuptools/
- 8: https://discuss.python.org/t/please-make-package-version-go-away/58501
- 9: https://pypi.org/project/setuptools/
- 10: https://discuss.python.org/t/replace-deprecated-python3-setup-py-install-call-with-new-fails/26660
- 11: https://packaging.python.org/en/latest/discussions/setup-py-deprecated/
- 12: https://github.com/pypa/setuptools/blob/main/NEWS.rst
- 13: https://packaging.python.org/guides/distributing-packages-using-setuptools/
- 14: [BUG] setuptools version 73.0.0 breaks Pythran pypa/setuptools#4579
- 15: https://www.piwheels.org/project/setuptools/
- 16: https://github.com/pypa/setuptools
- 17: https://github.com/pypa/setuptools/releases
Setuptools Dependency Constraint: Verify Compatibility
The removal of the setuptools version constraint in
environment.yml
(lines 9–10) increases flexibility—but it also opens the door to potential issues. The latest setuptools release (v78.1.0) introduces several breaking changes (like stricter configuration key validations) that could affect projects relying on legacy behaviors (for instance, dash-separated keys). Please review whether our project depends on any of these behaviors and consider pinning setuptools (e.g., using<78.0
) if compatibility issues arise.
- Actionable Check: Confirm that downstream processes and configurations can handle the changes in setuptools v78.1.0, or update the constraint accordingly.
.github/workflows/python-app.yml (2)
1-14
: Well-structured GitHub Actions workflow configuration.The workflow is properly configured to run on push and pull requests to the main branch with appropriate read permissions.
26-39
: Comprehensive CI steps for dependency installation, linting, and testing.The workflow includes all the essential steps for a CI pipeline: installing dependencies, running linting with proper configurations, and executing tests.
hvantk/commands/annotate_features.py (5)
10-23
: Improved import formatting for better readability.The imports have been refactored to a more structured multi-line format with consistent indentation, which enhances readability and makes future modifications easier.
🧰 Tools
🪛 Ruff (0.8.2)
19-19:
hvantk.utils.annotate.annotate_degs
imported but unusedRemove unused import:
hvantk.utils.annotate.annotate_degs
(F401)
27-43
: Consistent string formatting and function signature styling.String literals have been standardized to use double quotes, and function signatures have been reformatted for consistency.
48-57
: Consistent function call formatting.Function calls have been reformatted to follow a consistent style throughout the file.
74-101
: Consistent formatting for annotation function calls.All the annotation function calls have been reformatted for consistency, using a clean style that makes the code more readable while maintaining the same functionality.
111-139
: Improved argument parser formatting.The argument parser configuration has been reformatted with consistent indentation and multi-line style, which improves readability while maintaining the same functionality.
hvantk/tests/test_UCSCDataSetCollection.py (3)
4-4
: Added import for constant file path reference.Using a constant instead of hardcoded file paths is a good practice that improves maintainability.
26-40
: Improved JSON structure with proper trailing commas.Adding trailing commas to the JSON structure elements is a good practice as it makes future additions easier and produces cleaner diffs when changes are made.
62-64
: Using constant path instead of hardcoded string.Replaced the hardcoded file path with the imported constant, which improves maintainability and consistency.
.github/workflows/python-package-conda.yml (2)
1-35
: LGTM! The workflow is well-structured.The workflow is organized with a clear sequence of steps: checkout, Python setup, dependency installation, linting, and testing.
🧰 Tools
🪛 actionlint (1.7.4)
14-14: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
14-16
:✅ Verification successful
Update the actions/setup-python action version.
The GitHub Action
actions/setup-python@v3
is outdated. GitHub regularly deprecates older versions of actions, which can lead to compatibility issues in the future.- uses: actions/setup-python@v3 + uses: actions/setup-python@v5To verify the latest version:
🌐 Web query:
What is the latest version of GitHub's actions/setup-python action?
💡 Result:
The latest version of GitHub's
actions/setup-python
action is v5.5.0, released on October 24, 2024[8]. This version introduced several enhancements, including support for free-threaded Python versions (e.g.,3.13t
), architecture-specific PATH management improvements, and compatibility with.tool-versions
files[8].Key updates in v5.5.0 include:
- Free-threaded Python support: Enables using special syntax like
3.13t
for thread-optimized builds[8]- ARM64 runner improvements: Added Ubuntu ARM runner support and fixed PyPy architecture mapping on Linux ARM64[8]
- Dependency caching: Upgraded
@action/cache
to v4.0.3 for improved package management[8]The action continues to support Python, PyPy, and GraalPy installations across Linux, macOS, and Windows environments[2][4]. For projects requiring Python 2.7, the
LizardByte/setup-python-action
fork remains a recommended alternative[1].When using this action, remember to:
- Explicitly specify Python versions for consistency
- Leverage built-in caching with
cache: pip
for dependency management[3][6]- Check actions/python-versions for pre-installed interpreter availability[13]
Citations:
- 1: https://github.com/LizardByte/setup-python-action
- 2: https://github.com/marketplace/actions/setup-python
- 3: https://cicube.io/workflow-hub/actions-setup-python/
- 4: https://docs.github.com/en/actions/use-cases-and-examples/building-and-testing/building-and-testing-python
- 5: https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md
- 6: https://til.simonwillison.net/github-actions/cache-setup-py
- 7: GitHub action always uses the latest Python version even if other version was set up pypa/hatch#1543
- 8: https://github.com/actions/setup-python/releases
- 9: https://discourse.jupyter.org/t/use-github-workflows-to-automatically-publish-to-pypi-when-new-tags-are-created/14941
- 10: https://github.com/actions/setup-python
- 11: Python 3.12 support actions/setup-python#736
- 12: https://github.blog/changelog/2020-04-30-github-actions-v2-setup-python-action/
- 13: https://github.com/actions/python-versions
Action Required: Update GitHub Action to Latest Version
The web query confirms that the latest release of the
actions/setup-python
action is v5.5.0. Please update the workflow file accordingly. It’s recommended to pin the version to ensure reproducibility.
- File:
.github/workflows/python-package-conda.yml
(lines 14-16)- Proposed diff:
- uses: actions/setup-python@v3 + uses: actions/setup-python@v5.5.0🧰 Tools
🪛 actionlint (1.7.4)
14-14: the runner of "actions/setup-python@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
hvantk/tests/test_create_gene_annotation_tables.py (3)
12-16
: Good organization of test directories.Creating separate constants for test data and temporary directories improves clarity and follows good practices for test organization.
47-49
: Good practice: cleaning up temporary directories.Cleaning up temporary directories after tests is a good practice to prevent disk space issues during repeated test runs.
32-36
: LGTM! Properly converting Path objects to strings.Converting the Path objects to strings using
str()
before passing them to the functions ensures compatibility with the expected input types.hvantk/datasets/ucsc_cell_datasets.py (4)
5-9
: LGTM! Improved import formatting.The multi-line import format with trailing commas improves readability and makes future additions easier.
27-53
: LGTM! Well-structured docstring.The docstring formatting follows good practices with clear descriptions of the class and its attributes.
107-128
: LGTM! Improved error message formatting.The error message in the exception handling is properly formatted with f-strings.
178-209
: LGTM! Well-structured error handling in from_json method.The error handling in the
from_json
method is comprehensive, checking for required keys, validating the type of datasets, and handling various potential exceptions.hvantk/utils/make_tables.py (5)
16-22
: LGTM! Consistent function parameter formatting.The function parameters are now consistently formatted with two-space indentation, improving readability.
33-35
: LGTM! Clean formatting of function calls.The
hl.import_table
call is now formatted more compactly, improving readability.
66-74
: LGTM! Well-structured method chaining.The method chaining in
create_interactome_tb
is now more readable with proper indentation.
101-113
: LGTM! Improved formatting for complex operations.The dictionary comprehension and method chaining in
create_clinvar_tb
are now more consistently formatted.
189-208
: LGTM! Well-formatted complex aggregation.The complex aggregation in
create_ensembl_gene_tb
now follows a consistent indentation pattern, making the structure clearer.hvantk/utils/annotate.py (13)
6-18
: Improved Import Formatting.
The reformatting of the import section clearly separates each dataset utility function, which enhances readability and maintainability.
24-30
: Consistent ClinVar Label Formatting.
The benign and pathogenic ClinVar label lists have been reformatted into concise, single-line representations. This change improves clarity without altering the data.
32-37
: Clear Pre-calculation of ClinVar Conditions.
The computation ofis_pathogenic
andis_benign
is well laid out to improve readability and performance by pre-calculating these conditions before annotation.
38-44
: Structured ClinVar Annotation withhl.case()
.
The chained use ofhl.case()
, along with the clear handling of pathogenic and benign conditions, makes the logic in annotating ClinVar data both readable and robust.
55-62
: Refined Signature and Body inannotate_gevir
.
The function signature has been adjusted (removing an extraneous trailing comma) and the body remains clear with a concise application ofget_gevir_ht()
.
64-66
: Consistent Default Value Formatting inannotate_rnaseq_expression
.
Changing the default value fororgan
to use double quotes improves consistency with the rest of the codebase.
90-95
: Enhanced Transformation Pipeline inannotate_ensembl_gene
.
The revamped method chain for processinggene_ht
—from transmutation through explosion, re-keying, and selective field projection—improves readability and emphasizes each transformation step.
113-117
: Efficient Scoring Field Selection inannotate_dbnsfp_scores
.
The dynamic filtering of fields (selecting those ending with_score
or matchingCADD_phred
) makes the code more robust and adaptable to the structure of the dataset.
155-162
: Clear Differential Expression Annotation.
The use of a dictionary comprehension withint.transmute()
to generate cluster-specific annotations is effective, concise, and easy to follow.
167-178
: Enhanced Function Signature inannotate_hca
.
Aligning the parameters vertically—including the immutable tuple forcell_categories
—improves clarity and makes the function signature easier to read.
188-190
: Streamlined HCA Data Annotation.
The refactored retrieval and annotation of HCA data throughhl.struct()
is both clear and consistent, ensuring that the selected cell categories are correctly applied.
210-212
: Robust gnomAD AF Annotation Handling.
By usinghl.if_else
to check for the definition of the allele frequency expression and defaulting to a float-zero when absent, the implementation safeguards against missing data.
227-235
: Consistent Variant ID Construction.
Usinghl.delimit
to combine the contig, position, and allele values into a standardized variant ID string is clear and adheres to expected formatting conventions.
PR Type
Enhancement, Tests, Configuration changes
Description
Added new CI workflows for testing and linting with GitHub Actions.
Improved code formatting and consistency across multiple files.
Enhanced test coverage and updated test data handling.
Introduced constants for better file path management.
BASE_DIR
andUCSC_JSON_FILE_PATH
for centralized path handling.Changes walkthrough 📝
1 files
Minor formatting adjustment in imports
10 files
Standardized string quotes and improved function formatting
Removed unused options and improved CLI formatting
Updated CLI entry point formatting
Standardized string quotes and added constants for paths
Standardized string quotes and improved function formatting
Added `BASE_DIR` and `UCSC_JSON_FILE_PATH` constants
Standardized string quotes and improved function formatting
Improved error handling and logging in file downloads
Standardized string quotes and improved readability
Improved function formatting and added parameter documentation
2 files
Improved class docstrings and formatting
Added CI badge for Conda-based workflow
3 files
Updated tests to use constants for paths
Enhanced test data handling with temporary directories
Improved test structure and added mock responses
4 files
Standardized string quotes and updated dependencies
Added GitHub Actions workflow for Python application
Added Conda-based GitHub Actions workflow
Added GitHub Actions workflow for multi-version Python testing
6 files
Summary by CodeRabbit
New Features
Chores / Refactor
testdata
directory from the.gitignore
file to allow tracking in version control.