*********************************************** ***OncoMX readme v1.0beta *** ***Beta release *** ***September 17, 2018 *** *********************************************** OVERVIEW OncoMX is a knowledgebase of unified cancer genomics data from integrated mutation, expression, literature, and biomarker databases, accessible through web portal. DATA SOURCES AND FLOW The core underlying knowledgebase of OncoMX is derived from BioMuta and BioXpress integrated cancer mutation and expression databases. Normal expression data from Bgee and custom text mining software, DiMeX and DEXTER, augment the cancer data to improve functional interpretation of the reported variants and expression profiles. Where relevant, data are mapped to Disease Ontology and Uberon Anatomical Entity ontology terms to facilitate better integration. All data are wrapped into the OncoMX database and web portal, mapped to additional functional information from NCI Early Detection Research Network (EDRN) and Reactome. In version 1.0, cancer mutation and expression data are taken from: CIViC, ClinVar, COSMIC, ICGC, IntOGen, and TCGA. For more information regarding pipelines of contributing resources, please refer to the following: - Bgee https://bgee.org/ - BioMuta https://hive.biochemistry.gwu.edu/biomuta; https://academic.oup.com/nar/article/46/D1/D1128/4372542 - BioXpress https://hive.biochemistry.gwu.edu/bioxpress; https://academic.oup.com/nar/article/46/D1/D1128/4372542 - DEXTER https://academic.oup.com/database/article/doi/10.1093/database/bay045/5025486 - DiMeX http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152725 - EDRN https://edrn.nci.nih.gov/ - Reactome https://reactome.org/ HOW TO USE OncoMX can be used to search by data source/type or can be explored through one of four primary user perspectives, accessed from the dashboard, as described below. *********************************************** *********************************************** Search From the landing, click on one of the five data type buttons: Differential expression, Biomarkers, Pathways, Disease mutations, and Normal expression. (Please note - clicking Home here will redirect the user to the dashboard, whereas clicking Data sources will redirect the user to the original tables retrieved from each of the contributing resources.) All OncoMX tables can be searched by a text string using the search box in the upper right corner. Searching for a string in this way will filter the rendered rows to display only those containing the queried string. Visible data (the first 20 rows, by default) can be copied to your clipboard, downloaded as a .csv, or printed using the buttons immediately above the table in the left corner. Hits can be navigated using the pagination tools below the table to the right. *********************************************** Differential expression This table is indexed by gene/protein/mRNA or miRNA accession/name. In addition to text searching, you can use the filters on the left hand side of the table to filter rows by TCGA cancer types or by significance of differential expression. To access the gene detail view for a specific entry, click the hyperlinked gene symbol in the first column. Cross-references to numerous other resources can also be found in the table, including NCBI gene search, BioXpress entry, disease ontology, TCGA study, and Uberon ontology. Columns for this table are as follows: COLUMN HEADER DESCRIPTION Gene/miRNA Official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name, hyperlinked to the OncoMX detail view for this entry; if miRNA, this column will report the accession in the miRBase database UniProtKB/SwissProt AC Hyperlinked UniProtKB accession (accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database) linking to relevant entry in BioXpress P-value P-value associated with reported differential expression Adj. P-value P-value associated with reported differential expression adjusted for multiple testing Cancer Type Mapped Cancer Disease Ontology slim term and link to relevant disease ontology page Log2 F.C. Log2 fold change of expression of the gene/miRNA in tumor as compared to adjacent normal Patient Freq. Proportion of patients whose individual observed trend of expression matches the cancer-wise trend RefSeq AC RefSeq accession associated with the transcript Significant Reports whether the observed change in expression was determined to be significant (adjpval < 0.05) TCGA Cancer TCGA cancer acronym linked to the corresponding TCGA study page in NIH-NCI GDC Data Portal Anatomical Entity ID Uberon anatomical entity ID corresponding to the diseased tissue *********************************************** Biomarkers This table is indexed by gene symbol or panel name. Columns for this table are as follows: COLUMN HEADER DESCRIPTION Gene Symbol/Panel Official gene symbol or panel name, hyperlinked to the OncoMX detail view for this entry Type Type of biomarker (can be Epigenetic, Gene, Genomic, Protein, or Proteomic) Associated Dataset Dataset(s) associated with the biomarker as reported in EDRN database Is Panel Denotes if entry is a biomarker or a panel Phase Please see EDRN documentation QA State Denotes whether biomarker has been curated, accepted, or is under review Organ Organ in which biomarker is applicable (current options include Colon, Lung, Breast, and Ovary) HGNC Symbol Official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name Reference Resource Hyperlinked references reporting biomarker activity UniProtKB/SwissProt AC Accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database (not applicable to panels) *********************************************** Pathways This table is indexed by UniProtKB/SwissProt AC and reports events associated with a given protein, its evidence, and the pathway to which the event belongs. Columns for this table are as follows: COLUMN HEADER DESCRIPTION UniProtKB/SwissProt AC Accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database hyperlinked to UniProtKB Gene Symbol Official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name, hyperlinked to the OncoMX detail view for this entry Event Name of pathway event Evidence Code Evidence code for gene/protein participation in the event (can be IEA or TAS) Reactome Pathway ID Hyperlinked Reactome IDs for the corresponding pathway *********************************************** Disease mutations This table reports variants in cancer samples for each relevant in genomic and proteomic coordinates. Columns for this table are as follows: COLUMN HEADER DESCRIPTION Gene Symbol Official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name, hyperlinked to the OncoMX detail view for this entry UniProtKB/SwissProt AC Hyperlinked UniProtKB accession (accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database) linking to relevant entry in BioMuta RefSeq AC RefSeq accession associated with the canonical transcript Cancer Type Type of cancer associated with reported variant Functional Impact Denotes whether a reported variant is associated with a functional loss or gain of acetylation, phosphorylation, glycosylation, or other functional annotation from UniProtKB, or prediction from PolyPhen or NetNGlyc2.0 Genome Position Genomic position of the variant Nuc. Position Position of variant in nucleic acid sequence Ref. Nuc. Reference or wild-type nucleotide base Var. Nuc. Nucleotide base resulting from variation AA Position Position of variation in protein sequence Ref. AA Reference or wild-type amino acid residue Var. AA Amino acid residue resulting from variation Polyphen2 If applicable, lists the predicted effect of the variant reported by PolyPhen-2 (benign, possibly damaging, or probably damaging) PMID If available, PMID(s) of manually curated or semi-automatically mined (using DiMeX) publication(s) associated with the reported variation Source Data source of reported variation (can be CIViC, ClinVar, COSMIC, ICGC, or TCGA) Status Status of study from which variation was obtained (LG for large-scale, SM for small-scale) Anatomical Entity ID Uberon anatomical entity ID corresponding to the diseased tissue *********************************************** Normal expression This table reports the status of RNA-seq derived expression in normal samples. Columns for this table are as follows: COLUMN HEADER DESCRIPTION Gene Symbol Official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name, hyperlinked to the OncoMX detail view for this entry UniProtKB/SwissProt AC Accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database Ensembl Gene ID Hyperlinked Ensembl gene ID linking to relevant entry in Bgee Anatomical Entity Name Uberon anatomical entity name corresponding to sample tissue Developmental Stage Name Uberon developmental stage name corresponding to sample tissue Expression Call Indicates presence or absence of expression Expression Rank The lower the rank, the higher the expression level Call Quality Quality associated with call (can be high quality or poor quality) *********************************************** *********************************************** Detail View From each of the above tables, a user can click on the corresponding link in the Gene Symbol column to go to the gene-centric detail view. From this page, the user can see results from each of the various perspectives filtered for a specific gene. EDRN This tab summarizes biomarker details from EDRN. Fields are as follows: FIELD LABEL DESCRIPTION EDRN Title Name given to biomarker in EDRN Organ Organ in which biomarker is applicable (current options include Colon, Lung, Breast, and Ovary) Phase If available, reports the designation of the biomarker as one of five phases (Phase 1, Preclinical Exploratory; Phase 2, Clinical Assay and Validation; Phase 3, Retrospective Longitudinal; Phase 4, Prospective Screening; Phase 5, Cancer Control) QA Denotes whether biomarker has been curated, accepted, or is under review Aliases Other names/descriptions of the biomarker Description States the purpose and scope of the biomarker *********************************************** BioMuta Please see Disease mutations table description above. *********************************************** BioXpress Please see Disease expression table description above. *********************************************** Bgee Please see Normal expression table description above. *********************************************** Reactome Please see Pathways table description above. *********************************************** *********************************************** Dashboard An interactive dashboard can be accessed by scrolling down below the landing page. Four sections of interactive content can be accessed here: Perspectives, Statistics at a Glance, Data Sources, and News. *********************************************** Perspectives These views are currently under development. For now, clicking the links will redirect you to the relevant tables described in the search section above. Biomarkers will redirect to the Biomarkers table, Evolutionary Context to the normal expression table, Literature Mining to three tables (described below) with information about literature mining in cancer, and Biomarkers within Pathways to the Pathways table. Please note that detailed views are being built for each of these perspectives. Literature mining will eventually have its own interface independent of secondary sources using the data. ---------------------------------------------- Lung Cancer DiMeX This table displays hits for literature mined mentions of mutation in cancer using a customized application of DiMeX. Columns for this table are as follows: COLUMN HEADER DESCRIPTION PMID PMID(s) of mined (using DiMeX) publication(s) containing variant information UniProtKB/SwissProt AC Accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database Gene Symbol Official gene symbol approved by the HGNC, which is a short abbreviated form of the gene name Entrez ID Unique, stable, and tracked integer identifier Gene Mention Specific form/spelling of gene mentioned in the retrieved publication DOID Mapped Cancer Disease Ontology slim ID DOID Name Mapped Cancer Disease Ontology slim term Disease Mention Specific form/spelling of disease mentioned in the retrieved publication Mutation Mention Specific form of substitution mutation mentioned in the retrieved publication Mutation Type Specifies whether the mutation is mentioned in terms of amino acid or nucleotide substitutions and coordinates Abstract Extraction Section of abstract from which the relevant sentence was extracted Sentence Number Number of sentence extracted Sentence Text Text of sentence extracted Extraction Method Denotes whether relationship in sentence was an association or other Patient/Control Numbers If available, reports the number of patients and/or controls in the study Is Meta-Analysis Denotes if the publication was a meta-analysis Is Review Denotes if the publication was a review ----------------------------------------------- Lung Cancer DEXTER This table displays hits for literature mined mentions of expression in cancer using a customized application of DEXTER. Columns for this table are as follows: COLUMN HEADER DESCRIPTION UniProtKB/SwissProt AC Accession assigned to the protein isoform chosen to be the canonical sequence in UniProtKB database Entrez ID Unique, stable, and tracked integer identifier Gene Mention Specific form/spelling of gene mentioned in the retrieved publication PMID PMID(s) of mined (using DEXTER) publication(s) containing expression information DOID Mapped Cancer Disease Ontology slim ID DOID Name Mapped Cancer Disease Ontology slim term Disease Mention Specific form/spelling of disease mentioned in the retrieved publication Disease Extracted From Section of the publication from which the disease mention was extracted Expression Level Reports whether the expression change was reported to go up or down in disease Sentence Type Type of sentence, strength of assertion is strongest in TypeA Sample 1 Denotes the first (or only) group compared in the extracted sentence Sample 2 If the extracted sentence contains a comparison between two groups, denotes the second group compared in the extracted sentence Is Same Patient Reports whether a comparison is performed between two groups of samples from a single patient Sentence Text Text of sentence extracted ------------------------------------------------ miRNA DEXTER This table displays hits for literature mined mentions of miRNA expression in all cancers using a customized application of DEXTER. Columns for this table are as follows: COLUMN HEADER DESCRIPTION miRNA Number Number/ID of miRNA mentioned in publication miRNA Mention Specific form/spelling of miRNA mentioned in the retrieved publication PMID PMID(s) of mined (using DEXTER) publication(s) containing expression information DOID Mapped Cancer Disease Ontology slim ID DOID Name Mapped Cancer Disease Ontology slim term Disease Mention Specific form/spelling of disease mentioned in the retrieved publication Disease Extracted From Section of the publication from which the disease mention was extracted Expression Level Reports whether the expression change was reported to go up or down in disease Sentence Type Type of sentence, strength of assertion is strongest in TypeA Sample 1 Denotes the first (or only) group compared in the extracted sentence Sample 2 If the extracted sentence contains a comparison between two groups, denotes the second group compared in the extracted sentence Is Same Patient Reports whether a comparison is performed between two groups of samples from a single patient Sentence Text Text of sentence extracted *********************************************** Statistics at a Glance Clicking on any of the topics listed here or interacting with the Circos plot will toggle the charts displayed below. Statistics can be viewed as a series of charts across multiple resources (Total Cancer Terms and Proteins) or for each primary contributing resource (Biomarkers from EDRN, BioMuta, and BioXpress). Additional summary views will be added in subsequent releases. *********************************************** Data Sources Clicking any of the five contributing sources will take you to the original table retrieved from that source. All tables can be interacted with and downloaded as described above for the Search section. *********************************************** News This content will be updated as relevant news is available. *********************************************** *********************************************** Quick links Below the dashboard is a set of quick links to both internal and external resources. Here you can find links to help documentation, contact, and licensing information. *********************************************** *********************************************** The remainder of the page changes dynamically based on options selected above. README UPDATED: October 2, 2018 *********************************************** ***OncoMX readme v1.0beta *** ***Beta release *** ***September 17, 2018 *** ***********************************************