ClinEFf
Clinical Variant Annotations Software
ClinEf is a professional version of the SnpEff and SnpSift packages, suitable for production in clincal labs. ClinEff combines the flexibility of multiple SnpEff/SnpSift commands with simplicity of running one program to perform all the annotations at once (i.e. in a single pass). It is highly customizable and can be taylored to specific pipeline needs in Clinical production environments.
Why ClinEff?
ClinEff is a professional version of the well known
SnpEff + SnpSift annotation suite.
ClinEff is designed for clinical sequencing environments
Simple, fast, robust and reliable variant annotation, prioritization and reporting for whole genome sequencing, exome sequencing or gene panels.
- Based on the leading variant annotation packages SnpEff & SnpSift.
- Optimized for clinical labs, helps to deploy standardized workflows.
- Effect prediction, prioritization & classification.
- Cancer somatic vs. germline effects.
- Standards: VCF, Sequence Ontology, HGVS, 'ANN' VCF fields.
- Multiple sample support.
- Annotations are applied based upon your pre-configured settings.
- Supports many state of the art databases.
Why a professional license?
A professional license facilitates customization and compliance
ClinEff's professional license simplifies deployments on clinical environments.
- Reproducibility for CLIA and CAP certified analysis. Versioned databases.
- Long Term Support for software and database: 2 years (optionally more)
- Support for regulatory compliance
- Customized data input / output formats and connectors to third party systems
- Prioritized bug fixes
- Prioritized feature development
- Bug-fixes for Long Term Support versions
- Customized genome references and databases (e.g. targeted sequencing)
- Integration with open data bases
- Integration with private or proprietary databases
- Customizable clinical report and formats
- Privacy: Tickets, issues, pipeline-specific analysis and feature requests are completely private.
Download
ClinEff needs at least one of the genomic databases and a license
ClinEff Version | Databases v37 | Databases v38 |
---|---|---|
ClinEff v1.0h | hg19 / GRCh37 | hg38 / GRCh38 |
ClinEff v1.0g | ||
ClinEff v1.0f | ||
ClinEff v1.0e | ||
ClinEff v1.0d | ||
ClinEff v1.0c | ||
ClinEff v1.0 |
License
License type | Price |
---|---|
Academic / Non-Profit Research | Free: Request license |
Clinical / Professional |
$99.00 / Month + $1 per sample processed (taxes not included) If you experience any technical difficulties trying to purchase a license, please contact us |
Documentation
The following sections describe some technical details about ClinEff
Requirements
- Java 1.8 or higher.
- Memory: At least 6GB of RAM, 8GB or more are highly recommended.
- Operating system: It runs on any operating system that can run Java. Unix based operating systems (such as Linux or OS.X) are highly recommended.
Installing ClinEff
Don't like to install programs?
We can do it for you.
Contact us for a Cloud instance or a custom install.
Installing ClinEff simply requires to uncompress the downloaded files on your $HOME
directory:
$ cd # Go to your home dir $ tar -xzf clinEff_v*.tgz # Uncompress the main program (note that version number will change) $ cd clinEff # Go to the newly created 'clinEff' directory $ tar -xvf clinEff_db37_v*.tgz # Uncompress the databases package (use clinEff_db38_v*.tgz if you want GRCh38) # You also need to install the license files $ cd path/to/you/license files $ cp clineff.license* clinEff/ # Copy you license to the directory
Running ClinEff
Don't like running programs?
We can do it for you.
Contact us for ClinEff as a service.
In order to run ClinEff, we need to provide some memory options to Java.
We recommend using at least 6GB of RAM (-Xmx6G
).
So a basic command line is:
java -Xmx6g -jar clinEff.jarRunning the command without any options or files, will show a basic help screen:
$ java -Xmx6g -jar clinEff.jar ClinEff version ClinEff 1.0 (build 2016-12-04 12:00), by Pablo Cingolani Usage: ClinEff [command] [options] genome [file.vcf] Available command line options: -c , -config <file> : Specify config file. Default: clinEff.config -db <file.vcf> : Add annotations from database 'file.vcf' -d , -debug : Debug mode (very verbose) -h , -help : Show this help and exit -l , -license <file> : Path to license files -v , -verbose : Verbose mode -version : Show version number and exit -w , -workflow <file> : Workflow file. Default: workflow.configSome command line options can be specified either using a short or a long format,
Example run
$ java -Xmx8G -jar ClinEff.jar -v GRCh37.75 sample_1KG.vcf.gz > sample_1KG.ann.vcf ClinEff version ClinEff 1.0 (build 2016-12-10 18:22), by Pablo Cingolani 00:00:00 Reading config file: clinEff.config 00:00:00 Reading workflow file: workflow.config 00:00:00 Adding annotation database (VCF): 'db/GRCh37/clinVar/clinVar.20161129.vcf' 00:00:00 Adding annotation database (VCF): 'db/GRCh37/dbSnp/dbSnp_v149.20161122.vcf.gz' 00:00:00 Adding annotation module : 'gwasCatalog' 00:00:00 Adding annotation module : 'dbNsfp' 00:00:00 Adding filter 'AF_1KG': ((exists dbNSFP_1000Gp1_AF) && (dbNSFP_1000Gp1_AF >= 0.05)) 00:00:00 Adding filter 'AF_ESP': ((exists dbNSFP_ESP6500_AA_AF) && (dbNSFP_ESP6500_AA_AF >= 0.05)) || ((exists dbNSFP_ESP6500_EA_AF ) && (dbNSFP_ESP6500_EA_AF >= 0.05)) 00:00:00 Adding filter 'AF_EXAC': ((exists dbNSFP_ExAC_AF) && (dbNSFP_ExAC_AF >= 0.05)) 00:00:00 Adding SnpEff annotations 00:00:00 Adding annotation module : 'com.clineff.report.ReportLof' 00:00:00 Adding annotation module : 'com.clineff.report.ReportHighImpact' 00:00:00 Adding annotation module : 'com.clineff.report.ReportClinical' 00:00:00 License file 'clinEff.license' OK 00:00:00 Reading database for genome version 'GRCh37.75' from file 'data/GRCh37.75/snpEffectPredictor.bin' (this might take a while) 00:00:22 done ... 00:01:55 Logging
Databases
Need help customizing your database?
We can do it for you.
Contact us for a database customization service.
ClinEff's provides human genome databases for clinical applications.
Note that databases names are sometimes called genome
in the command line argument.
There are several databases provided:
Database / Genome name | Genome version | Information source |
---|---|---|
hg19 | hg19 / GRCh37 | UCSC RefSeq |
GRCh37.75 | hg19 / GRCh37 | ENSEMBL |
hg19kg | hg19 / GRCh37 | UCSC KnownGenes |
GRCh37.p13.RefSeq | hg19 / GRCh37 | NCBI RefSeq |
hg38 | hg38 / GRCh38 | UCSC RefSeq |
GRCh38.86 | hg38 / GRCh38 | ENSEMBL |
hg38kg | hg38 / GRCh38 | UCSC KnownGenes |
GRCh38.p7.RefSeq | hg38 / GRCh38 | NCBI RefSeq |
Workflows
Need help customizing your workflow?
We can do it for you.
Contact us for a workflow customization service.
In a nutshell, ClinEff's takes an input VCF file and applies a series of 'annotations modules'
Conceptually, each annotation module is similar to using a single SnpEff / SnpSift command line.
Instead of applying several independent tools, ClinEff optimizes the workflow by applying them in
one step and defining then in a single 'workflow' file (as opposed as running several command lines
glued toghether in a shell script).
This improves efficiency, repeatability and clinical compliance.
ClinEff is customized using workflow definition files.
The default workflow file is workflow.config
in ClinEff's install directory, but an
alternative file can be defined using the -w
command line option.
Workflow definition files define which annotation steps are used in ClinEff's annotation process.
These annotation steps are known as 'annotation modules' or simply 'modules'.
In the following paragraphs, we define the components fo a workflow file.
Annotation modules:
This section define the annotation modules applied.
It is a comma separated list of modules which can be either modules names, or java class names can be used.
Functional annotations ('ANN') annotations are always included, so there is no need to include them name in this section.
In this example, we only use gwasCatalog
and dbNsfp
annotation modules:
modules.annotation : gwasCatalog, dbNsfpAvailable modules
Module name | Corresponding SnpEff / SnpSift command | Module annotations |
---|---|---|
ann | SnpEff eff / ann | Functional annotations, protein changes, putative impact and loss of function prediction |
annotate | SnpSift annotate | Annotation using a database file (e.g. custom VCF files) |
caseControl | SnpSift caseControl | Compare how many variants are in 'case' and in 'control' groups; calculate p-values. This is for VCFs having many samples (cohort analysis) |
dbNsfp | SnpSift dbNsfp | Annotate with multiple entries from dbNSFP. These annotations include Uniprot, Interpro, SIFT, Polyphen2, LRT, MutationTaster, FATHMM, MetaSVM, VEST3, PROVEAN, CADD, GERP++, phyloP46way, phastCons46way, SiPhy, LRT, 1000Gp1, ESP6500, ARIC5606, ExAC, COSMIC, etc. |
filter | SnpSift filter | Filter variants based on arbitrary expression |
filterChrPos | SnpSift filterChrPos | Filter variants by genomic coordinates (i.e. chr:pos) |
filterGt | SnpSift filterGt | Filter genotype using arbitrary expressions. |
geneSets | SnpSift geneSets | Annotate GeneSet using MSigDb gene sets (MSigDb includes: GO, KEGG, Reactome, BioCarta, etc.) |
gwasCatalog | SnpSift gwasCatalog | Annotate using GWAS catalog |
hwe | SnpSift hwe | Calculate Hardy-Weimberg parameters and perform a goodness of fit test |
intervals | SnpSift intervals | Filter variants by genomic intervals with intervals. |
intervalsIdx | SnpSift intervalsIndex | Filter variants by genomic intervals with intervals. Index-based method: Used for large VCF file and a few intervals to retrieve |
op | SnpSift vcfOperator | Create a new field based on operations from other fields (e.g. get the maximum of two fields). |
private | SnpSift private | Annotate if a variant is private to a family or group (multi-sample VCF files with family tree structure information). |
rmInfo | SnpSift rmInfo | Delete an INFO field |
rmRef | SnpSift removeReferenceGenotypes | Delete reference alleles |
varType | SnpSift varType | Annotate variant type (e.g. SNP, INS, DEL) |
Annotation modules arguments:
This sections define additional command line arguments applied to each annotation module.
The format is args.MODULE_NAME : arg1 arg2 ... argN
.
For instance if we want to have functional annotations only in canonical trancripts and no up/downstream annotations, we could add the following line in our workflow:
args.ann : -canon -ud 0Similarly, you can define parameters for other annotation modules (e.g.
args.dbNsfp: ...
).
Module-specific parameters:
Some annotation modules almost always require parameters.
Instead if using the generic args.MODULE: ...
workflow directive, these can be configured using module specific entries.
Entry name | Module | Configuration | Format |
---|---|---|---|
database.dbnsfp |
dbNsfp | Path to dbNsfp database | path to a valid dbNsfp 'database' file. The file must be bgzip compressed and tabix-indexed) |
database.gwascatalog |
gwasCatalog | Path dtabase file | Path to a valis Gwas-Catalog file |
dbnsfp.fields |
dbNsfp | Defines dbNsfp fields to annotate | Comma separated list of fields to use for dbNSFP annotations (no spaces or tabs allowed in this list) |
VCF database annotations:
Often we need to annotate using VCF databases.
A typical example is to use dbSnp and ClinVar.
This can be defined using the annotation.db.vcf
section, e.g.:
annotation.db.vcf : db/GRCh37/clinVar/clinVar.20161129.vcf \ , db/GRCh37/dbSnp/dbSnp_v147.20160601.vcf.gzThe database files must be either compressed using bgzip (and have a tabix index) or be in plain VCF (no compression).
Filters:
Filtering variants is a common step in many clinical processing environments.
A ClinEff workflow can define many filters and allows to easily add and remove them.
Filters act by adding a 'filterName' to the FILTER VCF column.
Remember that in VCF jargon, a variants passes filters if the FILTER entry is empty ('.'
) or has a PASS
tag.
This means that when a 'filterName' is added to the FILTER VCF column, you are actually excluding the variant in downstream analysis.
Filters are defined using filter.FILTER_NAME
followed by an arbitrary expression (a valid SnpSift filter expression).
When the expression is satisfied, FILTER_NAME tag is added to the FILTER VCF column.
Here is a filter definition example:
# This filter has a filterName 'AF_1KG' # The filter is 'true' if the allele frequency from 1000 Genomes Project is more than 0.05 (i.e. 5%) # Note that if the filter is true, it will add a tag 'AF_1KG' to the FILTER VCF file filter.AF_1KG : ((exists dbNSFP_1000Gp1_AF) && (dbNSFP_1000Gp1_AF >= 0.05))We can have many filter definitions (they should use different 'filterName'). In order to enable a filter for a workflow, we need to add it to the
filters
list.
Note that filters that are not added to the filters
list are ignored.
So, for activating the filter in our previous example, we need to add the following line:
filters : AF_1KGThe
filters
entry can be a comma separated list of 'filterNames'.
Reporting modules
This sections define the reporting modules.
These are Java classes that can be customized for your reporting needs.
Contact us if you need to develop specific reports for your workflow.
The format is a comma separated list of report classes, e.g.:
modules.report : com.clineff.report.ReportLof \ , com.clineff.report.ReportHighImpact \ , com.clineff.report.ReportClinical
Workflows annotation steps
Need help customizing your workflow?
We can do it for you.
Contact us for a workflow customization service.
ClinEff annotation workflows can be highly customized, but all workflows follow the following steps:
- Functional annotations: Workflow entry
args.ann
. - VCF database annotations: Workflow entry
annotation.db.vcf
. - Annotation modules: Workflow entries
modules.annotation
andargs.MODULE
, and module-specific entries. - Annotation filters: Workflow entries
filters
andfitler.FILTER_NAME
entries.