Skip to content
This repository was archived by the owner on Nov 11, 2024. It is now read-only.

Releases: clintval/cvbio

3.0.0

09 Jan 18:26
Compare
Choose a tag to compare

Changelog

  • Breaking: Renamed UpdateDataContigNames to UpdateContigNames

Developers

  • Correct IDEA configuration files are now created by Mill so we can run test coverage in IntelliJ (#83)
  • Checksum calculating input streams (#75) and implicits (#80) for creating checksum hashes as side-effects to consuming byte streams
  • Add a command line tool API (#76) along with a Conda abstraction (#82), although this API is public, it may undergo breaking changes more readily than other APIs given its experimental design.

2.1.0

22 Dec 17:02
04d2c3b
Compare
Choose a tag to compare

Changelog

  • IgvBoss is now capable of initializing IGV from an installed MacOS Application (#69)
  • IgvBoss's CLI is rendered a little cleaner (#69)
  • IgvBoss will now timeout after 30 seconds of failure to connect to a running IGV session (#69)

2.0.0

10 Dec 18:04
07c2e33
Compare
Choose a tag to compare

Changelog

  • Breaking: Renamed RelabelReferenceNames to UpdateDataContigNames

UpdateDataContigNames

Update contig names in delimited data using a name mapping table.

A collection of mapping tables is maintained at the following location:

Features

  • Optionally drop rows which have chromosome names not in the mapping file
  • Replace multiple fields in a row at once using the same mapping file
  • Directly write-out rows that start with arbitrary strings (default of #)
  • Parses any delimited data using any single character delimiter

Command Line Usage

Relabel the contig names in an Ensembl human gene annotation file.

git clone https://github.com/dpryan79/ChromosomeMappings.gitwget ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gzcvbio UpdateDataContigNames \
    -i Homo_sapiens.GRCh38.96.gtf.gz \
    -o Homo_sapiens.GRCh38.96.ucsc-named.gtf.gz \
    -m ChromosomeMappings/GRCh38_ensembl2UCSC.txt \
    --comment-chars '#' \
    --columns 0 \
    --skip-missing false

1.4.1

10 Dec 06:24
c01a862
Compare
Choose a tag to compare
Remove a random println() in RelabelReferenceNames (#63)

1.4.0

10 Dec 05:51
f20725a
Compare
Choose a tag to compare

Changelog

  • Tool to relabel reference sequence names in delimited data using a chromosome name mapping table

RelabelReferenceNames

Relabel reference sequence names in delimited data using a chromosome name mapping table.

A collection of mapping tables is maintained at the following location:

Features

  • Optionally drop rows which have chromosome names not in the mapping file
  • Replace multiple fields in a row at once using the same mapping file
  • Directly write-out rows that startwith arbitrary strings (default of #)
  • Parses any delimited data using any single character delimiter

Command Line Usage

Relabel the chromosomes names in a human gene annotation file.

git clone https://github.com/dpryan79/ChromosomeMappings.gitwget -qO- ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz \
    | gzip -dc > Homo_sapiens.GRCh38.96.gtfcvbio RelabelReferenceNames \
    -i Homo_sapiens.GRCh38.96.gtf \
    -o Homo_sapiens.GRCh38.96.ucsc-named.gtf \
    -m ChromosomeMappings/GRCh38_ensembl2UCSC.txt \
    --skip-prefixes '#' \
    --columns 0 \
    --drop false

1.3.0

05 Nov 05:23
8f4adfe
Compare
Choose a tag to compare

Changelog

  • New preference settings for IgvBoss including:
    • Setting the downsampling status
    • Setting minimum and maximum base quality thresholds for shading
❯ cvbio IgvBoss -h 2>&1 | tail -n4
--downsample[[=true|false]]   Downsample reads. [Optional].
--base-quality-minimum=Int    Minimum base quality to shade. [Optional].
--base-quality-maximum=Int    Maximum base quality to shade. [Optional].

1.2.0 IgvBoss

04 Nov 08:18
f94f662
Compare
Choose a tag to compare

Features

  • Will start IGV for you if it's not already running
  • Quick syntax to navigate IGV from the commandline only
  • Easily re-load new files, travel to loci, and swap genomes.
  • Shut IGV down with a single command cvbio IgvBoss -x

Command Line Usage

cvbio IgvBoss -g mm10.fa -i infile.bam targets.bed -l $(cut -f4 < targets.bed | head -n2)

Long Tool Description

IgvBoss
------------------------------------------------------------------------------------------------------------------------
Take control of your IGV session from end-to-end.

IGV Startup
-----------

There are three supported ways to initialize IGV:

  * Let this tool connect to an already-running IGV session
  * Supply an IGV JAR file path and let this tool run it
  * Let this tool find an 'igv' executable on the system PATH and run it

This tool will always attempt to connect to a running IGV application before attempting to start a new instance of IGV.
Provide a path to an IGV JAR file if no IGV applications are currently running. If no IGV JAR file path is set, and
there are no running instances of IGV, then this tool will attempt to fnd 'igv' on the system PATH and execute the
application.

You can shutdown IGV on exit with the '--close-on-exit' option. This will work regardless of how this tool initially
connected to IGV and is handy for tearing down the application after your investigation is concluded.

Controlling IGV
---------------

If no inputs are provided, then no new sessions will be created. Loci, for now, will result in a split-window view.

References and Prior Art
------------------------

  * https://software.broadinstitute.org/software/igv/PortCommands
  * https://github.com/stevekm/IGV-snapshot-automator

1.1.0 Featured Template Disambiguation

04 Aug 04:07
bc37a99
Compare
Choose a tag to compare

Features

  • Accepts SAM/BAM sources of any sort order.
  • Will disambiguate an arbitrary number of BAMs, all aligned to different references
  • Writes the ambiguous alignments to an ambiguous-alignment specific directory

Command Line Usage

java -jar cvbio.jar Disambiguate -i infile1.bam infile2.bam -p insilico/disambiguated

Long Tool Description

Disambiguate
------------------------------------------------------------------------------------------------------------------------
Disambiguate reads that were mapped to multiple references.

Disambiguation of aligned reads is performed per-template and all information across primary, secondary, and
supplementary alignments is used as evidence. Alignment disambiguation is commonly used when analyzing sequencing data
from transduction, transfection, transgenic, or xenographic (including patient derived xenograft) experiments. This
tool works by comparing various alignment scores between a template that has been aligned to many references in order
to determine which reference is the most likely source.

All templates which are positively assigned to a single source reference are written to a reference-specific output BAM
file. Any templates with ambiguous reference assignment are written to an ambiguous input-specific output BAM file.
Only BAMs produced from the Burrows-Wheeler Aligner (bwa) or STAR are currently supported.

Input BAMs of arbitrary sort order are accepted, however, an internal sort to queryname will be performed unless the
BAM is already in queryname sort order. All output BAM files will be written in the same sort order as the input BAM
files. Although paired-end reads will give the most discriminatory power for disambiguation of short- read sequencing
data, this tool accepts paired, single-end (fragment), and mixed pairing input data.

Example
-------

To disambiguate templates that are aligned to human (A) and mouse (B):

  ❯ java -jar cvbio.jar Disambiguate -i sample.A.bam sample.B.bam -p sample/sample -n hg38 mm10

  ❯ tree sample/
    sample/
    ├── ambiguous-alignments/
    │  ├── sample.A.ambiguous.bai
    │  ├── sample.A.ambiguous.bam
    │  ├── sample.B.ambiguous.bai
    │  └── sample.B.ambiguous.bam
    ├── sample.hg38.bai
    ├── sample.hg38.bam
    ├── sample.mm10.bai
    └── sample.mm10.bam

Glossary
--------

  * MAPQ: A metric that tells you how confident you can be that a read comes from a reported mapping position.
  * AS: A metric that tells you how similar the read is to the reference sequence.
  * NM: A metric that measures the number of mismatches to the reference sequence (Hamming distance).

Prior Art
---------

  * Disambiguate (https://github.com/AstraZeneca-NGS/disambiguate) from AstraZeneca's NGS team

v1.0.0 Template Disambiguation

05 Jul 04:34
d199269
Compare
Choose a tag to compare

Command Line Usage

❯ java -jar cvbio.jar Disambiguate -i infile1.bam infile2.bam -p insilico/disambiguated

Benchmarks

Performance benchmarks (albeit crude), can be found in this respository's documentation.

Long Tool Description

Disambiguate
------------------------------------------------------------------------------------------------------------------------
Disambiguate reads that were mapped to multiple references.

Disambiguation of mapped reads is performed per-template and all information across primary, secondary, and
supplementary alignments is used as evidence. Alignment disambiguation is useful when analyzing sequencing data from
transduction, transfection, xenographic (including patient derived xenografts), and transgenic experiments. This tool
works by comparing various alignment scores between a template that has been mapped to many references in order to
determine which reference is the most likely source.

All templates which are positively assigned to a single source reference are written to a reference-specific output BAM
file. Any templates with ambiguous reference assignment are currently dropped.

Caveats
-------

  * No ambiguous BAM is currently written to the output prefix.
  * All input BAMs must have an Assembly Name defined in the first sequence of the sequence dictionary.
  * All input BAM files must be queryname grouped and synchronized on the read name.
  * Only BAMs produced from the Burrows-Wheeler Aligner (bwa) and STAR are currently supported.
  * Only BAMs produced from the same aligner are currently supported.

Glossary
--------

  * MAPQ: A metric that tells you how confident you can be that a read comes from a reported mapping position.
  * AS: A metric that tells you how similar the read is to the reference sequence.
  * NM: A metric that measures the number of mismatches to th reference sequence (Hamming distance).

Features for a Future Release
-----------------------------

  * Override the assembly names (output BAM prefixes)
  * Support 'tophat' or 'hisat2' alignments.
  * Check whether mixed aligners have been used and raise exception.

Prior Art
---------

  * Disambiguate (https://github.com/AstraZeneca-NGS/disambiguate) from AstraZeneca's NGS team

v0.0.4

15 May 10:00
6f0ecac
Compare
Choose a tag to compare
v0.0.4 Pre-release
Pre-release

Changelog

  • The StarAlignPipeline is now hardened to merge alignments post-mapping using an unmapped input BAM.