Releases · clintval/cvbio

09 Jan 18:26

clintval

3.0.0

ecbce12

3.0.0 Latest

Latest

Changelog

Breaking: Renamed UpdateDataContigNames to UpdateContigNames

Developers

Correct IDEA configuration files are now created by Mill so we can run test coverage in IntelliJ (#83)
Checksum calculating input streams (#75) and implicits (#80) for creating checksum hashes as side-effects to consuming byte streams
Add a command line tool API (#76) along with a Conda abstraction (#82), although this API is public, it may undergo breaking changes more readily than other APIs given its experimental design.

Assets 4

22 Dec 17:02

clintval

2.1.0

04d2c3b

2.1.0

Changelog

IgvBoss is now capable of initializing IGV from an installed MacOS Application (#69)
IgvBoss's CLI is rendered a little cleaner (#69)
IgvBoss will now timeout after 30 seconds of failure to connect to a running IGV session (#69)

Assets 4

10 Dec 18:04

clintval

2.0.0

07c2e33

2.0.0

Changelog

Breaking: Renamed RelabelReferenceNames to UpdateDataContigNames

UpdateDataContigNames

Update contig names in delimited data using a name mapping table.

A collection of mapping tables is maintained at the following location:

https://github.com/dpryan79/ChromosomeMappings

Features

Optionally drop rows which have chromosome names not in the mapping file
Replace multiple fields in a row at once using the same mapping file
Directly write-out rows that start with arbitrary strings (default of #)
Parses any delimited data using any single character delimiter

Command Line Usage

Relabel the contig names in an Ensembl human gene annotation file.

❯ git clone https://github.com/dpryan79/ChromosomeMappings.git
❯ wget ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz

❯ cvbio UpdateDataContigNames \
    -i Homo_sapiens.GRCh38.96.gtf.gz \
    -o Homo_sapiens.GRCh38.96.ucsc-named.gtf.gz \
    -m ChromosomeMappings/GRCh38_ensembl2UCSC.txt \
    --comment-chars '#' \
    --columns 0 \
    --skip-missing false

Assets 4

10 Dec 06:24

clintval

1.4.1

c01a862

1.4.1

Remove a random println() in RelabelReferenceNames (#63)

Assets 4

10 Dec 05:51

clintval

1.4.0

f20725a

1.4.0

Changelog

Tool to relabel reference sequence names in delimited data using a chromosome name mapping table

RelabelReferenceNames

Relabel reference sequence names in delimited data using a chromosome name mapping table.

A collection of mapping tables is maintained at the following location:

https://github.com/dpryan79/ChromosomeMappings

Features

Optionally drop rows which have chromosome names not in the mapping file
Replace multiple fields in a row at once using the same mapping file
Directly write-out rows that startwith arbitrary strings (default of #)
Parses any delimited data using any single character delimiter

Command Line Usage

Relabel the chromosomes names in a human gene annotation file.

❯ git clone https://github.com/dpryan79/ChromosomeMappings.git
❯ wget -qO- ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz \
    | gzip -dc > Homo_sapiens.GRCh38.96.gtf

❯ cvbio RelabelReferenceNames \
    -i Homo_sapiens.GRCh38.96.gtf \
    -o Homo_sapiens.GRCh38.96.ucsc-named.gtf \
    -m ChromosomeMappings/GRCh38_ensembl2UCSC.txt \
    --skip-prefixes '#' \
    --columns 0 \
    --drop false

Assets 4

05 Nov 05:23

clintval

1.3.0

8f4adfe

1.3.0

Changelog

New preference settings for IgvBoss including:
- Setting the downsampling status
- Setting minimum and maximum base quality thresholds for shading

❯ cvbio IgvBoss -h 2>&1 | tail -n4
--downsample[[=true|false]]   Downsample reads. [Optional].
--base-quality-minimum=Int    Minimum base quality to shade. [Optional].
--base-quality-maximum=Int    Maximum base quality to shade. [Optional].

Assets 4

04 Nov 08:18

clintval

1.2.0

f94f662

1.2.0 IgvBoss

Features

Will start IGV for you if it's not already running
Quick syntax to navigate IGV from the commandline only
Easily re-load new files, travel to loci, and swap genomes.
Shut IGV down with a single command cvbio IgvBoss -x

Command Line Usage

❯ cvbio IgvBoss -g mm10.fa -i infile.bam targets.bed -l $(cut -f4 < targets.bed | head -n2)

Long Tool Description

IgvBoss
------------------------------------------------------------------------------------------------------------------------
Take control of your IGV session from end-to-end.

IGV Startup
-----------

There are three supported ways to initialize IGV:

  * Let this tool connect to an already-running IGV session
  * Supply an IGV JAR file path and let this tool run it
  * Let this tool find an 'igv' executable on the system PATH and run it

This tool will always attempt to connect to a running IGV application before attempting to start a new instance of IGV.
Provide a path to an IGV JAR file if no IGV applications are currently running. If no IGV JAR file path is set, and
there are no running instances of IGV, then this tool will attempt to fnd 'igv' on the system PATH and execute the
application.

You can shutdown IGV on exit with the '--close-on-exit' option. This will work regardless of how this tool initially
connected to IGV and is handy for tearing down the application after your investigation is concluded.

Controlling IGV
---------------

If no inputs are provided, then no new sessions will be created. Loci, for now, will result in a split-window view.

References and Prior Art
------------------------

  * https://software.broadinstitute.org/software/igv/PortCommands
  * https://github.com/stevekm/IGV-snapshot-automator

Assets 4

04 Aug 04:07

clintval

1.1.0

bc37a99

1.1.0 Featured Template Disambiguation

Features

Accepts SAM/BAM sources of any sort order.
Will disambiguate an arbitrary number of BAMs, all aligned to different references
Writes the ambiguous alignments to an ambiguous-alignment specific directory

Command Line Usage

❯ java -jar cvbio.jar Disambiguate -i infile1.bam infile2.bam -p insilico/disambiguated

Long Tool Description

Disambiguate
------------------------------------------------------------------------------------------------------------------------
Disambiguate reads that were mapped to multiple references.

Disambiguation of aligned reads is performed per-template and all information across primary, secondary, and
supplementary alignments is used as evidence. Alignment disambiguation is commonly used when analyzing sequencing data
from transduction, transfection, transgenic, or xenographic (including patient derived xenograft) experiments. This
tool works by comparing various alignment scores between a template that has been aligned to many references in order
to determine which reference is the most likely source.

All templates which are positively assigned to a single source reference are written to a reference-specific output BAM
file. Any templates with ambiguous reference assignment are written to an ambiguous input-specific output BAM file.
Only BAMs produced from the Burrows-Wheeler Aligner (bwa) or STAR are currently supported.

Input BAMs of arbitrary sort order are accepted, however, an internal sort to queryname will be performed unless the
BAM is already in queryname sort order. All output BAM files will be written in the same sort order as the input BAM
files. Although paired-end reads will give the most discriminatory power for disambiguation of short- read sequencing
data, this tool accepts paired, single-end (fragment), and mixed pairing input data.

Example
-------

To disambiguate templates that are aligned to human (A) and mouse (B):

  ❯ java -jar cvbio.jar Disambiguate -i sample.A.bam sample.B.bam -p sample/sample -n hg38 mm10

  ❯ tree sample/
    sample/
    ├── ambiguous-alignments/
    │  ├── sample.A.ambiguous.bai
    │  ├── sample.A.ambiguous.bam
    │  ├── sample.B.ambiguous.bai
    │  └── sample.B.ambiguous.bam
    ├── sample.hg38.bai
    ├── sample.hg38.bam
    ├── sample.mm10.bai
    └── sample.mm10.bam

Glossary
--------

  * MAPQ: A metric that tells you how confident you can be that a read comes from a reported mapping position.
  * AS: A metric that tells you how similar the read is to the reference sequence.
  * NM: A metric that measures the number of mismatches to the reference sequence (Hamming distance).

Prior Art
---------

  * Disambiguate (https://github.com/AstraZeneca-NGS/disambiguate) from AstraZeneca's NGS team

Assets 4

05 Jul 04:34

clintval

1.0.0

d199269

v1.0.0 Template Disambiguation

Command Line Usage

❯ java -jar cvbio.jar Disambiguate -i infile1.bam infile2.bam -p insilico/disambiguated

Benchmarks

Performance benchmarks (albeit crude), can be found in this respository's documentation.

Long Tool Description

Disambiguate
------------------------------------------------------------------------------------------------------------------------
Disambiguate reads that were mapped to multiple references.

Disambiguation of mapped reads is performed per-template and all information across primary, secondary, and
supplementary alignments is used as evidence. Alignment disambiguation is useful when analyzing sequencing data from
transduction, transfection, xenographic (including patient derived xenografts), and transgenic experiments. This tool
works by comparing various alignment scores between a template that has been mapped to many references in order to
determine which reference is the most likely source.

All templates which are positively assigned to a single source reference are written to a reference-specific output BAM
file. Any templates with ambiguous reference assignment are currently dropped.

Caveats
-------

  * No ambiguous BAM is currently written to the output prefix.
  * All input BAMs must have an Assembly Name defined in the first sequence of the sequence dictionary.
  * All input BAM files must be queryname grouped and synchronized on the read name.
  * Only BAMs produced from the Burrows-Wheeler Aligner (bwa) and STAR are currently supported.
  * Only BAMs produced from the same aligner are currently supported.

Glossary
--------

  * MAPQ: A metric that tells you how confident you can be that a read comes from a reported mapping position.
  * AS: A metric that tells you how similar the read is to the reference sequence.
  * NM: A metric that measures the number of mismatches to th reference sequence (Hamming distance).

Features for a Future Release
-----------------------------

  * Override the assembly names (output BAM prefixes)
  * Support 'tophat' or 'hisat2' alignments.
  * Check whether mixed aligners have been used and raise exception.

Prior Art
---------

  * Disambiguate (https://github.com/AstraZeneca-NGS/disambiguate) from AstraZeneca's NGS team

Assets 4

15 May 10:00

clintval

0.0.4

6f0ecac

v0.0.4 Pre-release

Pre-release

Changelog

The StarAlignPipeline is now hardened to merge alignments post-mapping using an unmapped input BAM.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Developers

Changelog

Changelog

UpdateDataContigNames

Features

Command Line Usage

Changelog

RelabelReferenceNames

Features

Command Line Usage

Changelog

Features

Command Line Usage

Long Tool Description

Features

Command Line Usage

Long Tool Description

Command Line Usage

Benchmarks

Long Tool Description

Changelog

Releases: clintval/cvbio

3.0.0

Changelog

Developers

2.1.0

Changelog

2.0.0

Changelog

UpdateDataContigNames

Features

Command Line Usage

1.4.1

1.4.0

Changelog

RelabelReferenceNames

Features

Command Line Usage

1.3.0

Changelog

1.2.0 IgvBoss

Features

Command Line Usage

Long Tool Description

1.1.0 Featured Template Disambiguation

Features

Command Line Usage

Long Tool Description

v1.0.0 Template Disambiguation

Command Line Usage

Benchmarks

Long Tool Description

v0.0.4

Changelog