Sequence types and features ontology (SO)
A structured controlled vocabulary for sequence annotation, for the exchange of annotation data and for the description of sequence objects in databases.
A structured controlled vocabulary for sequence annotation, for the exchange of annotation data and for the description of sequence objects in databases.
[
A sequence variant located within 2KB 3’ of a gene.
A sequence variant located within 2KB 5’ of a gene.
A sequence variant that causes the extension of 3’ UTR, with regard to the reference sequence.
A UTR variant of exonic sequence of the 3’ UTR. Requested by visze github tracker ID 346.
A UTR variant of intronic sequence of the 3’ UTR. Requested by visze github tracker ID 346.
A sequence variant that causes the reduction of a the 3’ UTR with regard to the reference sequence.
A UTR variant of the 3’ UTR. EBI term 3prime UTR variations - In 3prime UTR.
A sequence variant that changes the resulting polypeptide structure.
A cytosine methylated at the 4 nitrogen.
A modified DNA cytosine base feature, modified by a carboxy group at the 5 carbon.
A modified DNA cytosine base feature, modified by a formyl group at the 5 carbon.
A modified DNA cytosine base feature, modified by a hydroxymethyl group at the 5 carbon.
A cytosine methylated at the 5 carbon.
A sequence variant that causes the extension of 5’ UTR, with regard to the reference sequence.
A UTR variant of exonic sequence of the 5’ UTR. Requested by visze github tracker ID 346.
A UTR variant of intronic sequence of the 5’ UTR. Requested by visze github tracker ID 346.
A 5’ UTR variant where a premature start codon is gained.
A 5’ UTR variant where a premature start codon is lost.
A 5’ UTR variant where a premature start codon is introduced, moved or lost. Requested by Andy Menzies at the Sanger. This isn’t necessarily a protein coding change. A premature start codon can effect the production of a mature protein product by providing a competing translation start point. Some genes balance their expression this way, eg THPO requires the presence of a premature start to limit expression, its loss leads to Familial thrombocythemia.
A sequence variant that causes the reduction of a the 5’UTR with regard to the reference sequence.
A UTR variant of the 5’ UTR. EBI term: 5prime UTR variations - In 5prime UTR (untranslated region).
A sequence variant located within a half KB of the end of a gene.
A sequence variant located within 5 KB of the end of a gene. EBI term Downstream variations - Within 5 kb downstream of the 3prime end of a transcript.
A sequence variant located within 5KB 5’ of a gene. EBI term Upstream variations - Within 5 kb upstream of the 5prime end of a transcript.
A modified DNA adenine base,at the 8 carbon, often the product of DNA damage.
A modified DNA guanine base,at the 8 carbon, often the product of DNA damage.
An A box within an RNA polymerase III type 1 promoter. The A box can be found in the promoters of type 1 and type 2 (pol III) so sub-typing here allows the part of relationship of the subtypes to remain true.
An A box within an RNA polymerase III type 2 promoter. The A box can be found in the promoters of type 1 and type 2 (pol III) so sub-typing here allows the part of relationship of the subtypes to remain true.
A region forming a motif, composed of adenines, where the minor groove edges are inserted into the minor groove of another helix.
A transversion from adenine to cytidine.
A transition of an adenine to a guanine.
A transversion from adenine to thymine.
A conserved 17-bp sequence (5’-ATCA(C/A)AACCCTAACCCT-3’) commonly present upstream of the start site of histone transcription units functioning as a transcription factor binding site.
A transcript that has been processed “incorrectly”, for example by the failure of splicing of one or more exons.
A region of DNA that is depleted of nucleosomes and accessible to DNA-binding proteins including transcription factors and nucleases. Added as part of GREEKC terms. See GitHub Issues #531 & #534.
A promoter element with consensus sequence CCAGCC, bound by the fungal transcription factor Ace2.
Active peptides are proteins which are biologically active, released from a precursor molecule. Hormones, neuropeptides, antimicrobial peptides, are active peptides. They are typically short (<40 amino acids) in length.
An adaptive island is a genomic island that provides an adaptive advantage to the host. The iron-uptake ability of many pathogens are conveyed by adaptive islands. Nature Reviews Microbiology 2, 414-424 (2004); doi:10.1038 micro 884 GENOMIC ISLANDS IN PATHOGENIC AND ENVIRONMENTAL MICROORGANISMS Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jorg Hacker.
A non-polar, hydorophobic amino acid encoded by the codons GCN (GCT, GCC, GCA and GCG). A place holder for a cross product with chebi.
A primary transcript encoding alanyl tRNA.
A tRNA sequence that has an alanine anticodon, and a 3’ alanine binding region.
An allele is one of a set of coexisting sequence variants of a gene.
A physical quality which inheres to the allele by virtue of the number instances of the allele within a population. This is the relative frequency of the allele at a given locus in a population. Requested by HL7 clinical genomics group.
Allelic exclusion is a process occurring in diploid organisms, where a gene is inactivated and not expressed in that cell. Examples are x-inactivation and immunoglobulin formation.
A gene that is allelically_excluded.
A polyploid where the multiple chromosome set was derived from a different organism.
A motif of five consecutive residues and two H-bonds in which: H-bond between CO of residue(i) and NH of residue(i+4), H-bond between CO of residue(i) and NH of residue(i+3),Phi angles of residues(i+1), (i+2) and (i+3) are negative.
An attribute of alteration of one or more chromosomes.
Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. Discrete.
[alternately_spliced_gene_encodeing_one_transcript]
[alternately_spliced_gene_encoding_greater_than_one_transcript]
An attribute describing a situation where a gene may encode for more than 1 transcript.
[alternatively_spliced_gene_encoding_greater_than_1_polypeptide_coding_regions_overlapping]
A transcript that is alternatively spliced.
[alternatively_spliced_transcript_encoding_greater_than_1_polypeptide_different_start_codon_different_stop_codon_coding_regions_non_overlapping; alternatively_spliced_transcript_encoding_greater_than_1_polypeptide_different_start_codon_different_stop_codon_coding_regions_non-overlapping]
A deletion of an Alu mobile element with respect to a reference.
An insertion of sequence from the Alu family of mobile elements.
A ambisense_RNA_virus is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus with both messenger and anti messenger polarity.
A sequence variant within a CDS resulting in the loss of an amino acid from the resulting polypeptide.
A sequence variant within a CDS resulting in the gain of an amino acid to the resulting polypeptide.
A sequence variant of a codon resulting in the substitution of one amino acid for another in the resulting polypeptide.
An origin_of_replication that is used for the amplification of a chromosomal nucleic acid sequence.
Part of an edited transcript only. [anchor_binding_site; transcript_region; anchor binding site]
A region of a guide_RNA that base-pairs to a target mRNA.
A non-palindromic sequence found in the promoters of genes whose expression is regulated in response to androgen.
A chromosome structural variation whereby either a chromosome exists in addition to the normal chromosome complement or is lacking. Examples are Nullo-4, Haplo-4 and triplo-4 in Drosophila.
The status of a whole genome sequence,where annotation, and verification of coding regions has occurred.
A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts are antisense of ARRET transcripts. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.
A sequence of three nucleotide bases in tRNA which recognizes a codon in mRNA.
A sequence of seven nucleotide bases in tRNA which contains the anticodon. It has the sequence 5’-pyrimidine-purine-anticodon-modified purine-any base-3.
A peptide region which hydrogen bonded to another region of peptide running in the oposite direction (one running N-terminal to C-terminal and one running C-terminal to N-terminal). Hydrogen bonding occurs between every other C=O from one strand to every other N-H on the adjacent strand. In this case, if two atoms C-alpha (i) and C-alpha (j) are adjacent in two hydrogen-bonded beta strands, then they form two mutual backbone hydrogen bonds to each other’s flanking peptide groups; this is known as a close pair of hydrogen bonds. The peptide backbone dihedral angles (phi, psi) are about (-140 degrees, 135 degrees) in antiparallel sheets. Range.
Non-coding RNA transcribed from the opposite DNA strand compared with other transcripts and overlap in part with sense RNA. Relationship is_a SO:0000644 antisense_RNA added 23 April 2021. See GitHub Issue #443
The reverse complement of the primary transcript.
A promoter element with consensus sequence TGACTCA, bound by AP-1 and related transcription factors.
A chromosome originating in an apicoplast.
A gene from apicoplast sequence.
DNA belonging to the genome of an apicoplast, a non-photosynthetic plastid.
DNA or RNA molecules that have been selected from random pools based on their ability to bind other molecules.
An intron characteristic of Archaeal tRNA and rRNA genes, where intron transcript generates a bulge-helix-bulge motif that is recognised by a splicing endoribonuclease. Intron characteristic of tRNA genes; splices by an endonuclease-ligase mediated mechanism.
Archaeosine is a modified 7-deazoguanosine.
A positively charged, hydorophilic amino acid encoded by the codons CGN (CGT, CGC, CGA and CGG), AGA and AGG. A place holder for a cross product with chebi.
A primary transcript encoding arginyl tRNA (SO:0000255).
A tRNA sequence that has an arginine anticodon, and a 3’ arginine binding region.
A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts consist of C rich repeats. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.
A non coding RNA transcript, complementary to subtelomeric tract of TERRA transcript but devoid of the repeats. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.
A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host.
The ACS is an 11-bp sequence of the form 5’-WTTTAYRTTTW-3’ which is at the core of every yeast ARS, and is necessary but not sufficient for recognition and binding by the origin recognition complex (ORC). Functional ARSs require an ACS, as well as other cis elements in the 5’ (C domain) and 3’ (B domain) flanking sequences of the ACS.
A polar, hydorophilic amino acid encoded by the codons AAT and AAC. A place holder for a cross product with chebi.
A primary transcript encoding asparaginyl tRNA (SO:0000256).
A tRNA sequence that has an asparagine anticodon, and a 3’ asparagine binding region.
A negatively charged, hydorophilic amino acid encoded by the codons GAT and GAC. A place holder for a cross product with chebi.
A primary transcript encoding aspartyl tRNA (SO:0000257).
A tRNA sequence that has an aspartic acid anticodon, and a 3’ aspartic acid binding region.
“A primer containing an SNV at the 3’ end for accurate genotyping.
A region of sequence where the final nucleotide assignment differs from the original assembly due to an improvement that replaces a mistake.
[assortment_derived_aneuploid; assortment-derived_aneuploid]
A multi-chromosome aberration generated by reassortment of other aberration components; presumed to have a deficiency or a duplication.
[assortment_derived_deficiency; assortment-derived_deficiency]
A multi-chromosome deficiency aberration generated by reassortment of other aberration components.
[assortment_derived_deficiency_plus_duplication]
A multi-chromosome aberration generated by reassortment of other aberration components; presumed to have a deficiency and a duplication.
[assortment_derived_duplication]
A multi-chromosome duplication aberration generated by reassortment of other aberration components.
A chromosome variation derived from an event during meiosis.
A motif of five consecutive residues and two H-bonds in which: Residue(i) is Aspartate or Asparagine (Asx), side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2) or (i+3), main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+3) or (i+4).
A motif of three consecutive residues and one H-bond in which: residue(i) is Aspartate or Asparagine (Asx), the side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2).
Left handed type I (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, -90 degrees < psi +120 degrees < +40 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.
Left handed type II (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, +80 degrees < psi +120 degrees < +180 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.
Right handed type I (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, -90 degrees < psi +120 degrees < +40 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.
Right handed type II (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, +80 degrees < psi +120 degrees < +180 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.
An internal RNA loop where one of the strands includes more bases than the corresponding region on the other strand.
An integration/excision site of a bacterial chromosome at which a recombinase acts to insert foreign DNA containing a cognate integration/excision site.
An attC site is a sequence required for the integration of a DNA of an integron.
An attachment site located on a conjugative transposon and used for site-specific integration of a conjugative transposon.
A sequence segment located within the five prime end of an mRNA that causes premature termination of translation.
A region within an integron, adjacent to an integrase, at which site specific recombination involving an attC_site takes place.
A region that results from recombination between attP_site and attB_site, composed of the 5’ portion of attB_site and the 3’ portion of attP_site.
An integration/excision site of a phage chromosome at which a recombinase acts to insert the phage DNA at a cognate integration/excision site on a bacterial chromosome.
A region that results from recombination between attP_site and attB_site, composed of the 5’ portion of attP_site and the 3’ portion of attB_site.
A uORF beginning with the canonical start codon AUG.
A self spliced intron.
The gene product is involved in its own transcriptional regulation.
An autosynaptic chromosome is the aneuploid product of recombination between a pericentric inversion and a cytologically wild-type chromosome.
A variably distant linear promoter region recognized by TFIIIC, with consensus sequence AGGTTCCAnnCC. Binds TFIIIC.
Bacterial Artificial Chromosome, a cloning vector that can be propagated as mini-chromosomes in a bacterial host. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.
[BAC_clone]
A region of DNA that has been inserted into the bacterial genome using a bacterial artificial chromosome. Requested by Andy Schroder - Flybase Harvard, Nov 2006.
A region of sequence from the end of a BAC clone that may provide a highly specific marker. Requested by Keith Boroevich December, 2006.
A contig of BAC reads. Requested by Bayer Cropscience December, 2011.
A DNA sequence to which bacterial RNA polymerase binds, to begin transcription. former parent RNA_polymerase_promoter SO:0001203 was merged with promoter SO:0000167 in Aug 2020 as part of GREEKC.
A region which is part of a bacterial RNA polymerase promoter. This is a manufactured term to allow the parts of bacterial_RNApol_promoter to have an is_a path back to the root.
A DNA sequence to which bacterial RNA polymerase sigma 70 binds, to begin transcription.
A bacterial promoter with sigma ecf factor binding dependency. This is a type of bacterial promoters that requires a sigma ECF factor to bind to identified -10 and -35 sequence regions in order to mediate binding of the RNA polymerase to the promoter region as part of transcription initiation. Requested by Kevin Clancy - invitrogen -May 2012.
A DNA sequence to which bacterial RNA polymerase sigma 54 binds, to begin transcription.
A terminator signal for bacterial transcription. Moved to transcriptional_cis_regulatory_region (SO:0001055) from gene_group_regulatory_region (SO:0000752) on 11 Feb 2021 when SO:0000752 was merged into SO:0001055. See GitHub Issue #529.
A region of sequence where the final nucleotide assignment is different from that given by the base caller due to an improvement that replaces a mistake.
Two bases paired opposite each other by hydrogen bonds creating a secondary structure.
A variant that does not affect the function of the gene or cause disease.
A motif of three residues within a beta-sheet in which the main chains of two consecutive residues are H-bonded to that of the third, and in which the dihedral angles are as follows: Residue(i): -140 degrees < phi(l) -20 degrees , -90 degrees < psi(l) < 40 degrees. Residue (i+1): -180 degrees < phi < -25 degrees or +120 degrees < phi < +180 degrees, +40 degrees < psi < +180 degrees or -180 degrees < psi < -120 degrees.
A motif of three residues within a beta-sheet consisting of two H-bonds. Beta bulge loops often occur at the loop ends of beta-hairpins.
A motif of three residues within a beta-sheet consisting of two H-bonds in which: the main-chain NH of residue(i) is H-bonded to the main-chain CO of residue(i+4), the main-chain CO of residue i is H-bonded to the main-chain NH of residue(i+3), these loops have an RL nest at residues i+2 and i+3.
A motif of three residues within a beta-sheet consisting of two H-bonds in which: the main-chain NH of residue(i) is H-bonded to the main-chain CO of residue(i+5), the main-chain CO of residue i is H-bonded to the main-chain NH of residue(i+4), these loops have an RL nest at residues i+3 and i+4.
A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles of the second and third residues, which are the basis for sub-categorization.
Left handed type I:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles:- Residue(i+1): -140 degrees > phi > -20 degrees, -90 degrees > psi > +40 degrees. Residue(i+2): -140 degrees > phi > -20 degrees, -90 degrees > psi > +40 degrees.
Left handed type II: A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees > phi > -20 degrees, +80 degrees > psi > +180 degrees. Residue(i+2): +20 degrees > phi > +140 degrees, -40 degrees > psi > +90 degrees.
Right handed type I:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees. Residue(i+2): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.
Right handed type II:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees < phi < -20 degrees, +80 degrees < psi < +180 degrees. Residue(i+2): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.
A motif of four consecutive peptide residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -60 degrees, psi ~ -30 degrees. Residue(i+2): phi ~ -120 degrees, psi ~ 120 degrees.
A motif of four consecutive peptide resides of type VIa or type VIb and where the i+2 residue is cis-proline.
A motif of four consecutive peptide residues, of which the i+2 residue is proline, and that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -60 degrees, psi ~ 120 degrees. Residue(i+2): phi ~ -90 degrees, psi ~ 0 degrees.
A type VIa beta turn with the following phi and psi sngles on amino acid residues 2 and 3: phi-2 = -60 degrees, psi-2 = 120 degrees, phi-3 = -90 degrees, psi-3 = 0 degrees.
A type VIa beta turn with the following phi and psi sngles on amino acid residues 2 and 3: phi-2 = -120 degrees, psi-2 = 120 degrees, phi-3 = -60 degrees, psi-3 = 0 degrees.
A motif of four consecutive peptide residues, of which the i+2 residue is proline, and that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -120 degrees, psi ~ 120 degrees. Residue(i+2): phi ~ -60 degrees, psi ~ 0 degrees.
A sequence variant whereby two genes, on alternate strands have become joined. Requested by SNPEFF team. Feb 2016.
A promoter that can allow for transcription in both directions. Definition updated in Aug 2020 by Dave Sant.
A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. See GO:0005488 : binding.
A region of a peptide that is involved in a biochemical function. Range.
A region defined by its disposition to be involved in a biological process.
A region which is intended for use in an experiment.
An interchromosomal mutation whereby the (large) region between the first two breaks listed is lost, and the two flanking segments (one of them centric) are joined as a translocation to the free ends resulting from the third break.
A chromosomal inversion caused by three breaks in the same chromosome; both central segments are inverted in place (i.e., they are not transposed).
A reading_frame that is interrupted by one or more stop codons; usually identified through inter-genomic sequence comparisons. Term requested by Rama from SGD.
A restriction enzyme cleavage site where both strands are cut at the same position.
A restriction enzyme recognition site that, when cleaved, results in no overhangs.
An attribute describing a sequence that is bound by another molecule. Formerly called transcript_by_bound_factor.
An attribute describing a sequence that is bound by a nucleic acid.
An attribute describing a sequence that is bound by a protein.
Boundary elements are DNA motifs that prevent heterochromatin from spreading into neighboring euchromatic regions. Requested by Antonia Lock. Insulator is included as a related synonym since this is used to refer to insulator in the literature (NCBI:cf).
A pyrimidine rich sequence near the 3’ end of an intron to which the 5’end becomes covalently bound during nuclear splicing. The resulting structure resembles a lariat.
A core RNA polymerase II promoter element with consensus (G/A)T(T/G/A)(T/A)(G/T)(T/G)(T/G).
A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements at -37 to -32 with respect to the TSS (+1). Consensus sequence is (G|C)(G|C)(G|A)CGCC. Binds TFIIB. Binds TFIIB.
A cis-acting element found in the 3’ UTR of some mRNA which is bound by the Drosophila Bruno protein and its homologs. Not to be confused with BRE_motif (SO:0000016), which binds transcription factor II B.
An RNA polymerase III type 1 promoter with consensus sequence CAnnCCn.
Genomic DNA of immunoglobulin/T-cell receptor gene including more than one C-gene.
Most box C/D snoRNAs also contain long (>10 nt) sequences complementary to rRNA. Boxes C and D, as well as boxes C’ and D’, are usually located in close proximity, and form a structure known as the box C/D motif. This motif is important for snoRNA stability, processing, nucleolar targeting and function. A small number of box C/D snoRNAs are involved in rRNA processing; most, however, are known or predicted to serve as guide RNAs in ribose methylation of rRNA. Targeting involves direct base pairing of the snoRNA at the rRNA site to be modified and selection of a rRNA nucleotide a fixed distance from box D or D'.
snoRNA that is associated with guiding methylation of nucleotides. It contains two short conserved sequence motifs: C (RUGAUGA) near the 5-prime end and D (CUGA) near the 3-prime end.
A primary transcript encoding a small nucleolar RNA of the box C/D family.
Genomic DNA of immunoglobulin/T-cell receptor gene including C-region (and introns if present) with 5’ UTR (SO:0000204) and 3’ UTR (SO:0000205).
The constant region of an immunoglobulin polypeptide sequence.
The more polar, carboxy-terminal region of the signal peptide (approx 3-7 aa).
A transversion from cytidine to adenine.
A transversion of a cytidine to a guanine.
A transition of a cytidine to a thymine.
The transition of cytidine to thymine occurring at a pCpG site as a consequence of the spontaneous deamination of 5’-methylcytidine.
A kind of transcription_initiation_cluster defined by the clustering of CAGE tags on a sequence region.
A CAGE tag is a sequence tag hat corresponds to 5’ ends of mRNA at cap sites, produced by cap analysis gene expression and used to identify transcriptional start sites.
A gene suspected of being involved in the expression of a trait. Requested by Bayer Cropscience December, 2011.
The canonical 5’ splice site has the sequence “GT”.
The major class of splice site with dinucleotides GT and AG for donor and acceptor sites, respectively.
The canonical 3’ splice site has the sequence “AG”.
A structure consisting of a 7-methylguanosine in 5’-5’ triphosphate linkage with the first nucleotide of an mRNA. It is added post-transcriptionally, and is not encoded in the DNA.
An attribute describing when a sequence, usually an mRNA is capped by the addition of a modified guanine nucleotide at the 5’ end.
An mRNA that is capped.
A primary transcript that is capped.
A promoter element bound by the MADS family of transcription factors with consensus 5’-(C/T)TA(T/A)4TA(G/A)-3’. Requested by Antonia Lock
A gene that is a member of a gene cassette, which is a mobile genetic element.
A cassette pseudogene is a kind of gene in an inactive form which may recombine at a telomeric locus to form a functional copy. Requested by the Trypanosome community.
Amino acid involved in the activity of an enzyme. Discrete.
A motif of 4 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i psi -10 bounds -50 to 30, res i+1: phi -90 bounds -120 to -60, res i+1: psi -10 bounds -50 to 30, res i+2: phi -75 bounds -100 to -50, res i+2: psi 140 bounds 110 to 170. The extra restriction of the length of the O to O distance is similar, that it be less than 5 Angstrom. In this case these two Oxygen atoms are the main chain carbonyl oxygen atoms of residues i-1 and i+2.
A motif of 3 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -75 bounds -100 to -50, res i+1: psi 140 bounds 110 to 170. An extra restriction of the length of the O to O distance would be useful, that it be less than 5 Angstrom. More precisely these two oxygens are the main chain carbonyl oxygen atoms of residues i-1 and i+1.
A motif of 4 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -90 bounds -120 to -60, res i+1: psi -10 bounds -50 to 30, res i+2: phi -75 bounds -100 to -50, res i+2: psi 140 bounds 110 to 170. The extra restriction of the length of the O to O distance is similar, that it be less than 5 Angstrom. In this case these two Oxygen atoms are the main chain carbonyl oxygen atoms of residues i-1 and i+2.
A motif of 3 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -75 bounds -100 to -50, res i+1: psi 140 bounds 110 to 170. An extra restriction of the length of the O to O distance would be useful, that it be less than 5 Angstrom. More precisely these two oxygens are the main chain carbonyl oxygen atoms of residues i-1 and i+1.
Base sequence at the 3’ end of a tRNA. The 3’-hydroxyl group on the terminal adenosine is the attachment point for the amino acid.
A promoter element with consensus sequence CCAAT, bound by a protein complex that represses transcription in response to low iron levels.
Complementary DNA; A piece of DNA copied from an mRNA and spliced into a vector for propagation in a suitable host. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.
A match against cDNA sequence.
An RNA polymerase II promoter element found in the promoters of genes regulated by calcineurin. The consensus sequence is GNGGCKCA.
A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon.
A sequence variant extending the CDS, that causes elongation of the resulting polypeptide sequence. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)
A sequence variant extending the CDS at the 5’ end, that causes elongation of the resulting polypeptide sequence at the N terminus. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)
A portion of a CDS that is not the complete CDS.
A CDS with the evidence status of being independently known.
A CDS that is predicted.
A region of a CDS.
A CDS that is supported by domain similarity.
A CDS that is supported by similarity to EST or cDNA data.
A CDS that is supported by proteomics data.
A CDS that is supported by sequence similarity data.
A sequence variant extending the CDS at the 3’ end, that causes elongation of the resulting polypeptide sequence at the C terminus. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)
The central, hydrophobic region of the signal peptide (approx 7-15 aa).
A region of chromosome where the spindle fibers attach during mitosis and meiosis.
A centromere DNA Element I (CDEI) is a conserved region, part of the centromere, consisting of a consensus region composed of 8-11bp which enables binding by the centromere binding factor 1(Cbf1p). This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.
A centromere DNA Element II (CDEII) is part a conserved region of the centromere, consisting of a consensus region that is AT-rich and ~ 75-100 bp in length. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.
A centromere DNA Element I (CDEI) is a conserved region, part of the centromere, consisting of a consensus region that consists of a 25-bp which enables binding by the centromere DNA binding factor 3 (CBF3) complex. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.
A repeat region found within the modular centromere.
A cDNA clone invalidated because it is chimeric.
A region of sequence identified by CHiP seq technology to contain a protein binding site.
A chromosome originating in a chloroplast.
DNA belonging to the genome of a chloroplast, a photosynthetic plastid. This term is used by MO.
A sequencer read of a chloroplast DNA sample. Requested by Bayer Cropscience, October, 2012.
DNA belonging to the genome of a chloroplast, a green plastid for photosynthesis.
A chromosome originating in a chromoplast.
A gene from chromoplast_sequence.
DNA belonging to the genome of a chromoplast, a colored plastid for synthesis and storage of pigments.
An incomplete chromosome.
Regions of the chromosome that are important for regulating binding of chromosomes to the nuclear matrix.
Regions of the chromosome that are important for structural elements.
A chromosome structure variant whereby a region of a chromosome has been transferred to another position. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type.
An attribute of a change in the structure or number of a chromosomes.
When a genome contains an abnormal amount of chromosomes.
A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere.
A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. “Band’ is a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.
A sequence within the micronuclear DNA of ciliates at which chromosome breakage and telomere addition occurs during nuclear differentiation.
A chromosomal region that may sustain a double-strand break, resulting in a recombination event.
A chromosome that occurred by the division of a larger chromosome.
A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number.
A region of a chromosome. This is a manufactured term, that serves the purpose of allow the parts of a chromosome to have an is_a path to the root.
An alteration of the genome that leads to a change in the structure or number of one or more chromosomes.
A deviation in chromosome structure or number.
A quality of a nucleotide polymer that has no terminal nucleotide residues. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.
Structural unit composed of a self-replicating, double-stranded, circular DNA molecule.
Structural unit composed of a self-replicating, double-stranded, circular RNA molecule.
Structural unit composed of a self-replicating, single-stranded, circular DNA molecule.
Structural unit composed of a self-replicating, single-stranded, circular DNA molecule.
A genome region where chromosome pairing occurs preferentially during homologous chromosome pairing during early meiotic prophase of Meiosis I. Comment: An example of this is the Sme2 locus in fission yeast S. pombe, where is coincident with an ribonuclear complex termed the “Mei2 dot”. This term was Requested by Val Wood, PomBase.
A structural region in an RNA molecule which promotes ribosomal frameshifting of cis coding sequence. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.
A regulatory region where transcription factor binding sites are clustered to regulate various aspects of transcription activities. (CRMs can be located a few kb to hundreds of kb upstream of the core promoter, in the coding sequence, within introns, or in the untranslated regions (UTR) sequences, and even on a different chromosome). A single gene can be regulated by multiple CRMs to give precise control of its spatial and temporal expression. CRMs function as nodes in large, intertwined regulatory network. CRM DNA accessibility is subject to regulation by dbTFs and transcription co-TFs. Requested by Stephen Grossmann Dec 2004. Changed relationship from has_part SO:0000235 TF_binding site to TF_binding_site is part_of SO:0000727 CRM in response to requests from GREEKC initiative in Aug 2020. Removed 3’ from definition because 5’ UTRs are included as well, notified by Colin Logie of GREEKC. Nov 9 2020. DS Updated name from ‘CRM’ to ‘cis_regulatory_module’ on 08 Feb 2021. See GitHub Issue #526. DS Added final sentence to definition as part of GREEKC Feb 16, 2021. See GitHub Issue #534.
Intronic 2 bp region bordering exon. A splice_site that adjacent_to exon and overlaps intron.
Small non-coding RNA (55-65 nt long) containing highly conserved 5’ and 3’ ends (16 and 8 nt, respectively) that are predicted to come together to form a stem structure. Identified in the social amoeba Dictyostelium discoideum and localized in the cytoplasm. Requested by Karen Pilcher - Dictybase. song-Term Tracker-1574577.
Small non-coding RNA (59-60 nt long) containing 5’ and 3’ ends that are predicted to come together to form a stem structure. Identified in the social amoeba Dictyostelium discoideum and localized in the cytoplasm.
The C-terminal residues of a polypeptide which are exchanged for a GPI-anchor.
The initiator methionine that has been cleaved from a mature polypeptide sequence.
The cleaved_peptide_region is the region of a peptide sequence that is cleaved during maturation. Range.
Part of the primary transcript that is clipped off during processing.
[clone_attribute]
A read from an end of the clone sequence.
The region of sequence that has been inserted and is being propagated by the clone.
The end of the clone insert.
The start of the clone insert.
[clone_insert_start]
[cloned]
[cloned_cDNA]
A clone insert made from cDNA.
[cloned_genomic]
A clone insert made from genomic DNA.
The region of sequence that has been inserted and is being propagated by the clone. Added in response to Lynn Crosby. A clone insert may be composed of many cloned regions.
Coding region of sequence similarity by descent from a common ancestor.
The last base to be translated into protein. It does not include the stop codon.
An exon whereby at least one base is part of a codon (here, ‘codon’ is inclusive of the stop_codon).
The region of an exon that encodes for protein sequence. An exon containing either a start or stop codon will be partially coding and partially non coding.
A sequence variant that changes the coding sequence.
The first base to be translated into protein.
A transcript variant occurring within an intron of a coding transcript.
A transcript variant of a protein coding gene.
A protein coding transcript containing a retained intron. Term added as part of collaboration with Gencode, adding biotypes used in annotation.
An attribute of a coding genomic variant.
A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS.
An attribute describing the alteration of codon meaning.
A MGE region consisting of two fused plasmids resulting from a replicative transposition event.
When a variant from the genomic sequence is commonly found in the general population.
A secondary structure variant that compensate for the change made by a previous variant.
A sequence variant that changes the resulting polypeptide structure.
A variant that changes the translational product with respect to the reference.
A contiguous cluster of translocations, usually the result of a single catastrophic event such as chromothripsis or chromoanasynthesis.
A structural sequence alteration or rearrangement encompassing one or more genome fragments, with 4 or more breakpoints.
When no simple or well defined DNA mutation event describes the observed DNA change, the keyword “complex” should be used. Usually there are multiple equally plausible explanations for the change.
A transcript variant with a complex INDEL- Insertion or deletion that spans an exon/intron border or a coding sequence/UTR border. EBI term: Complex InDel - Insertion or deletion that spans an exon/intron border or a coding sequence/UTR border.
Polypeptide region that is rich in a particular amino acid or homopolymeric and greater than three residues in length. Range.
A chromosome structure variant where a monocentric element is caused by the fusion of two chromosome arms.
One arm of a compound chromosome. FLAG - this term is should probably be a part of rather than an is_a.
[computed_feature]
. similar to:<sequence_id>
A sequence variant in the CDS region that causes a conformational change in the resulting polypeptide sequence.
A region of a polypeptide, involved in the transition from one conformational state to another. MM Young, K Kirshenbaum, KA Dill & S Highsmith. Predicting conformational switches in proteins. Protein Science, 1999, 8, 1752-64. K. Kirshenbaum, M.M. Young and S. Highsmith. Predicting Allosteric Switches in Myosins. Protein Science 8(9):1806-1815. 1999.
A transposon that encodes function required for conjugation.
A sequence produced from an aligment algorithm that uses multiple sequences as input. Term added Dec 06 to comply with mapping to MGED terms. It should be used to generate consensus regions. The specific cross product terms they require are consensus_region and consensus_mRNA.
A consensus AFLP fragment is an AFLP sequence produced from any alignment algorithm which uses assembled multiple AFLP sequences as input. Requested by Bayer Cropscience September, 2013.
Genomic DNA sequence produced from some base calling or alignment algorithm which uses aligned or assembled multiple gDNA sequences as input. Requested by Bayer Cropscience November, 2012.
An mRNA sequence produced from an aligment algorithm that uses multiple sequences as input. DO not obsolete without considering MGED mapping.
A region that has a known consensus sequence. DO not obsolete without considering MGED mapping.
A sequence variant of a codon causing the substitution of a similar amino acid for another in the resulting polypeptide.
An inframe decrease in cds length that deletes one or more entire codons from the coding sequence but does not change any remaining codons.
An inframe increase in cds length that inserts one or more codons into the coding sequence between existing codons.
A sequence variant whereby at least one base of a codon is changed resulting in a codon that encodes for a different but similar amino acid. These variants may or may not be deleterious.
A region that is similar or identical across more than one species.
A sequence variant located in a conserved intergenic region, between genes. Requested by Uma Paila (UVA) for snpEff.
A transcript variant occurring within a conserved region of an intron. Requested by Uma Paila (UVA) for snpEff.
Region of sequence similarity by descent from a common ancestor.
A promoter that allows for continual transcription of gene.
A collection of contigs. See tracker ID: 2138359.
A DNA sequencer read which is part of a contig.
A sequence variant where copies of a feature (CNV) are either increased or decreased.
A sequence variant where copies of a feature are decreased relative to the reference.
A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.
A sequence variant where copies of a feature are increased relative to the reference.
A sequence alteration whereby the copy number of a given region is less than the reference sequence.
A variation that increases or decreases the copy number of a given region.
An element that only exists within the promoter region of a eukaryotic gene.
An element that always exists within the promoter region of a gene. When multiple transcripts exist for a gene, the separate transcripts may have separate core_promoter_elements. Added by Dave to be consistent with other ontologies updated with GREEKC initiative.
A cloning vector that is a hybrid of lambda phages and a plasmid that can be propagated as a plasmid or packaged as a phage,since they retain the lambda cos sites. Paper: vans GA et al. High efficiency vectors for cosmid microcloning and genomic analysis. Gene 1989; 79(1):9-20. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.
[cosmid_clone]
Binding involving a covalent bond.
MERGED DEFINITION: TARGET DEFINITION: A promoter element with consensus sequence TGACGTCA; bound by the ATF/CREB family of transcription factors. ——————– SOURCE DEFINITION: A promoter element that contains a core sequence TGACGT, bound by a protein complex that regulates transcription of genes encoding PKA pathway components. New synonym Atf1/Pcr1 recognition motif added in response to Antonia Lock GitHub Issue Request #437, PMID:15716492
Clustered Palindromic Repeats interspersed with bacteriophage derived spacer sequences.
A nucleotide match against a sequence from another organism.
Posttranslationally formed amino acid bonds.
A feature_attribute describing a feature that is not manifest under normal conditions.
A gene that is not transcribed under normal conditions and is not critical to normal cellular functioning.
A remnant of an integrated prophage in the host genome or an “island” in the host genome that includes phage like-genes. This is not cryptic in the same sense as a cryptic gene or cryptic splice site.
A sequence variant whereby a new splice site is created due to the activation of a new acceptor.
A sequence variant whereby a new splice site is created due to the activation of a new donor.
A splice site that is in part of the transcript not normally spliced. They occur via mutation or transcriptional error.
A sequence variant causing a new (functional) splice site.
A maxicircle gene so extensively edited that it cannot be matched to its edited mRNA sequence.
A promoter element with consensus sequence GTGRGAA, bound by CSL (CBF1/RBP-JK/Suppressor of Hairless/LAG-1) transcription factors.
An enterobacterial RNA that binds the CsrA protein. The CsrB RNAs contain a conserved motif CAGGXXG that is found in up to 18 copies and has been suggested to bind CsrA. The Csr regulatory system has a strong negative regulatory effect on glycogen biosynthesis, glyconeogenesis and glycogen catabolism and a positive regulatory effect on glycolysis. In other bacteria such as Erwinia caratovara the RsmA protein has been shown to regulate the production of virulence determinants, such extracellular enzymes. RsmA binds to RsmB regulatory RNA which is also a member of this family.
A gene from chloroplast sequence.
A transcription factor binding site with consensus sequence CCGCGNGGNGGCAG, bound by CCCTF-binding factor.
A non-canonical start codon of sequence CTG.
A promoter element bound by copper ion-sensing transcription factors such as S. cerevisiae Mac1p or S. pombe Cuf1; the consensus sequence is HTHNNGCTGD (more specifically TTTGCKCR in budding yeast).
A chromosome originating in a cyanelle.
A gene from cyanelle sequence.
DNA belonging to the genome of a cyanelle, a photosynthetic plastid found in algae.
A chromosomal translocation whereby three breaks occurred in three different chromosomes. The centric segment resulting from the first break listed is joined to the acentric segment resulting from the second, rather than the third.
A polar amino acid encoded by the codons TGT and TGC. A place holder for a cross product with chebi.
A primary transcript encoding cysteinyl tRNA (SO:0000258).
A tRNA sequence that has a cysteine anticodon, and a 3’ cysteine binding region.
Polypeptide region that is localized inside the cytoplasm.
Cytosolic 16S rRNA is an RNA component of the small subunit of cytosolic ribosomes in prokaryotes. Renamed to cytosolic_16S_rRNA from rRNA_16S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.
Cytosolic 23S rRNA is an RNA component of the large subunit of cytosolic ribosomes in prokaryotes. Renamed from rRNA_23S to cytosolic_23S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.
Cytosolic 5S rRNA is an RNA component of the large subunit of cytosolic ribosomes in both prokaryotes and eukaryotes. Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.
Cytosolic LSU rRNA is an RNA component of the large subunit of cytosolic ribosomes. Renamed to cytosolic_LSU_rRNA from large_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.
A gene that codes for cytosolic LSU rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
Cytosolic rRNA is an RNA component of the small or large subunits of cytosolic ribosomes. Added as a request from EBI. See GitHub Issue #493
A gene which codes for 16S_rRNA, which functions as the small subunit of the ribosome in prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene which codes for 18S_rRNA, which functions as the small subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene which codes for 23S_rRNA, which functions as a component of the large subunit of the ribosome in prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene which codes for 25S_rRNA, which functions as a component of the large subunit of the ribosome in some eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene which codes for 28S_rRNA, which functions as a component of the large subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene which codes for 5_8S_rRNA (5.8S rRNA), which functions as a component of the large subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene which codes for 5S_rRNA, which is a portion of the large subunit of the ribosome in both eukaryotes and prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
A gene that codes for cytosolic rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
Cytosolic SSU rRNA is an RNA component of the small subunit of cytosolic ribosomes. Renamed to cytosolic_SSU_rRNA from small_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.
A gene that codes for cytosolic SSU rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).
Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including more than one D-gene.
Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene and one C-gene.
Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene and one DJ-gene.
Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene, one J-gene and one C-gene.
Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene, and one J-gene.
Recombination signal including D-heptamer, D-spacer and D-nonamer in 5’ of D-region of a D-gene or D-sequence.
Germline genomic DNA including D-region with 5’ UTR and 3’ UTR, also designated as D-segment.
Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one D-gene, one J-gene and one C-gene.
Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one D-gene and one J-gene.
A genetic marker, discovered using Diversity Arrays Technology (DArT) technology.
The sequence referred to by an entry in a databank such as GenBank or SwissProt.
A primer with one or more mismatches to the DNA template corresponding to a position within a restriction enzyme recognition site.
A discontinuous core element of RNA polymerase II transcribed genes, situated downstream of the TSS. It is composed of three sub elements: SI, SII and SIII.
A sub element of the DCE core promoter element, with consensus sequence CTTC.
A sub element of the DCE core promoter element with consensus sequence CTGT.
A sub element of the DCE core promoter element with consensus sequence AGC.
A conserved polypeptide motif that mediates protein-protein interaction and defines adaptor proteins for DDB1/cullin 4 ubiquitin ligases. Note: PMID:18794354 describes the DDB box, and has lots of alignments, but doesn’t actually come out with a consensus sequence.
A variant arising in the offspring that is not found in either of the parents.
A non-functional descendant of an exon. Does not have to be part of a pseudogene.
A transcript processing variant whereby polyadenylation of the encoded transcript is decreased with respect to the reference. Term requested by M. Dumontier, June 1 2011.
A sequence variant that decreases the level of mature, spliced and processed RNA with respect to a reference sequence.
A sequence variant that decreases transcript stability with respect to a reference sequence.
A sequence variant that decreases the rate of transcription with respect to a reference sequence.
A sequence variant which decreases the translational product level with respect to a reference sequence.
An island that contains genes for integration/excision and the gene and site for the initiation of intercellular transfer by conjugation. It can be complemented for transfer by a conjugative transposon.
An interchromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining.
An intrachromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining.
A chromosomal deletion whereby three breaks occur in the same chromosome; one central region is lost, and the other is inverted.
A chromosomal deletion whereby a translocation occurs in which one of the four broken ends loses a segment before re-joining.
To remove a subsection of sequence.
An edit to delete a uridine. The insertion and deletion of uridine (U) residues, usually within coding regions of mRNA transcripts of cryptogenes in the mitochondrial genome of kinetoplastid protozoa.
The point at which one or more contiguous nucleotides were excised.
The point within a chromosome where a deletion begins or ends.
The space between two bases in a sequence which marks the position where a deletion has occurred.
A sequence alteration which included an insertion and a deletion, affecting 2 or more bases. Indels can have a different number of bases than the corresponding reference sequence. The term name was changed from indel to delins on 2/24/2019 to align with the HGVS nomenclature term for a deletion-insertion. Indel was causing confusion in the annotation community (github issue 445). The HGVS nomenclature definition of deletion-insertion (delins) is a sequence change where, compared to a reference sequence, one or more nucleotides are replaced by one or more other nucleotides and which is not a substitution, inversion or conversion. Indels can have a different number of bases than the corresponding reference sequence.
An oligonucleotide sequence that was designed by an experimenter that may or may not correspond with any natural sequence.
A conserved polypeptide motif that can be recognized by both Fizzy/Cdc20- and FZR/Cdh1-activated anaphase-promoting complex/cyclosome (APC/C) and targets a protein for ubiquitination and subsequent degradation by the APC/C. The consensus sequence is RXXLXXXXN.
An autosynaptic chromosome carrying the two right (D = dextro) telomeres. Corrected spelling from dexstrosynaptic_chromosome to dextrosynaptic_chromosome on April 14, 2020 in response to GitHub request #447
A repeat region which is part of the regional centromere outer repeat region. For the S. pombe project - requested by Val Wood.
A repeat region which is part of the regional centromere outer repeat region. For the S. pombe project - requested by Val Wood.
Non-base-paired sequence of nucleotide bases in tRNA. It contains several dihydrouracil residues.
An attribute describing a sequence that contains the code for two gene products.
An mRNA that has the quality dicistronic.
A primary transcript that has the quality dicistronic.
A transcript that is dicistronic.
A site at which replicated bacterial circular chromosomes are decatenated by site specific resolvase.
A modified RNA base in which the 5,6-dihydrouracil is bound to the ribose ring.
A region of a repeating dinucleotide sequence (two bases).
A diplotype is a pair of haplotypes from a given individual. It is a genotype where the phase is known.
A quality of an insertion where the insert is not in a cytologically inverted orientation.
A tandem duplication where the individual regions are in the same orientation.
The attribute of whether the sequence is the same direction as a feature (forward) or the opposite direction as a feature (reverse).
A reading frame that could encode a full-length protein but which contains obvious mid-sequence disablements (frameshifts or premature stop codons).
A variant that has been found to be associated with disease.
A variant that has been found to cause disease.
An inframe decrease in cds length that deletes bases from the coding sequence starting within an existing codon.
An inframe increase in cds length that inserts one or more codons into the coding sequence within an existing codon.
A duplication of the distal region of a chromosome. This term is used by Complete Genomics in the structural variant analysis files.
A regulatory promoter element that is distal from the TSS.
A recoding signal that is found many hundreds of nucleotides 3’ of a redefined stop codon.
The covalent bond between sulfur atoms that binds two peptide chains or different parts of one peptide chain and is a structural determinant in many protein molecules. 2 discreet & joined.
Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene and one C-gene.
Genomic DNA of immunoglobulin/T-cell receptor gene in partially rearranged genomic DNA including D-J-region with 5’ UTR and 3’ UTR, also designated as D-J-segment.
Genomic DNA in rearranged configuration including at least one D-J-GENE, one J-GENE and one C-GENE.
Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene, and one J-gene.
A promoter motif with consensus sequence CARCCCT.
A sequence element characteristic of some RNA polymerase II promoters, usually located between -60 and -45 relative to the TSS. Consensus sequence is MKSYGGCARCGSYSS. Tends to co-occur with DMv3 (SO:0001160). Tends to not occur with DPE motif (SO:0000015) or MTE (SO:0001162).
A sequence element characteristic of some RNA polymerase II promoters, usually located between -30 and +15 relative to the TSS. Consensus sequence is KNNCAKCNCTRNY. Tends to co-occur with DMv2 (SO:0001161). Tends to not occur with DPE motif (SO:0000015) or MTE (0001162).
A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements with respect to the TSS (+1). Consensus sequence is YGGTCACACTR. Marked spatial preference within core promoter; tend to occur near the TSS, although not as tightly as INR (SO:0000014).
A sequence element characteristic of some RNA polymerase II promoters, usually located between -50 and -10 relative to the TSS. Consensus sequence is KTYRGTATWTTT. Tends to co-occur with DMv4 (SO:0001157) . Tends to not occur with DPE motif (SO:0000015) or MTE (SO:0001162).
An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a 2-deoxy-D-ribose ring connected to a phosphate backbone.
DNA molecules that have been selected from random pools based on their ability to bind other molecules.
A binding site that, in the molecule, interacts selectively and non-covalently with DNA.
Structural unit composed of a self-replicating, DNA molecule.
A double-stranded DNA used to control macromolecular structure and function.
[DNA_invertase_target_sequence]
This has been obsoleted as it represents a process. replaced_by: GO:0006260. [DNA replication mode; DNA_replication_mode]
A folded DNA sequence.
A transposon where the mechanism of transposition is via a DNA intermediate.
DNA region representing open chromatin structure that is hypersensitive to digestion by DNase I.
A DNA sequence with catalytic activity. Added by request from Colin Batchelor.
A variant where the mutated gene product adversely affects the other (wild type) gene product. Requested by Deanna Church.
When a nucleotide polymer has two strands that are reverse-complement to one another and pair together. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.
DNA synthesized from RNA by reverse transcriptase that has been copied by PCR to make it double stranded.
Structural unit composed of a self-replicating, double-stranded DNA molecule.
Structural unit composed of a self-replicating, double-stranded RNA molecule.
A sequence variant located 3’ of a gene. Different groups annotate up and downstream to different lengths. The subtypes are specific and are backed up with cross references.
A feature variant, where the alteration occurs downstream of the transcript termination site. Requested by Graham Ritchie, EBI/Sanger.
A sequence element characteristic of some RNA polymerase II promoters; Positioned from +28 to +32 with respect to the TSS (+1). Experimental results suggest that the DPE acts in conjunction with the INR_motif to provide a binding site for TFIID in the absence of a TATA box to mediate transcription of TATA-less promoters. Consensus sequence (A|G)G(A|T)(C|T)(G|A|C). Binds TAF6, TAF9.
A promoter motif with consensus sequence CGGACGT.
A promoter element with consensus sequence CGWGGWNGMM, bound by transcription factors related to RecA and found in promoters of genes expressed following several types of DNA damage or inhibition of DNA synthesis.
A sequence element characteristic of some RNA polymerase II promoters, usually located between -10 and -60 relative to the TSS. Consensus sequence is WATCGATW. This consensus sequence was identified computationally using the MEME algorithm within core promoter sequences from -60 to +40, with an E value of 1.7e-183. Tends to co-occur with Motif 7. Tends to not occur with DPE motif (SO:0000015) or motif 10.
A ds_DNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as double stranded DNA.
A double stranded oligonucleotide. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.
A ds_RNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as double stranded RNA.
The determinant of selective removal (DSR) motif consists of repeats of U(U/C)AAAC. The motif targets meiotic transcripts for removal during mitosis via the exosome. Requested by Antonia Locke, (Pombe).
DsrA RNA regulates both transcription, by overcoming transcriptional silencing by the nucleoid-associated H-NS protein, and translation, by promoting efficient translation of the stress sigma factor, RpoS. These two activities of DsrA can be separated by mutation: the first of three stem-loops of the 85 nucleotide RNA is necessary for RpoS translation but not for anti-H-NS action, while the second stem-loop is essential for antisilencing and less critical for RpoS translation. The third stem-loop, which behaves as a transcription terminator, can be substituted by the trp transcription terminator without loss of either DsrA function. The sequence of the first stem-loop of DsrA is complementary with the upstream leader portion of RpoS messenger RNA, suggesting that pairing of DsrA with the RpoS message might be important for translational regulation.
A pseudogene that arose via gene duplication. Generally duplicated pseudogenes have the same structure as the original gene, including intron-exon structure and some regulatory sequence.
One or more nucleotides are added between two adjacent nucleotides in the sequence; the inserted sequence derives from, or is identical in sequence to, nucleotides adjacent to insertion point.
An attribute of a duplication, which is an insertion which derives from, or is identical in sequence to, nucleotides present at a known location in the genome.
A read produced by the dye terminator method of sequencing.
A sequence element characteristic of some RNA polymerase II promoters, usually located between -60 and +1 relative to the TSS. Consensus sequence is AWCAGCTGWT. Tends to co-occur with DMv2 (SO:0001161). Tends to not occur with DPE motif (SO:0000015).
An origin of replication that initiates early in S phase.
[edit_operation; edit operation]
An attribute describing a sequence that is modified by editing.
[edited_by_A_to_I_substitution]
[transcript_edited_by_C-insertion_and_dinucleotide_insertion; edited_by_C_insertion_and_dinucleotide_insertion]
[edited_by_C_to_U_substitution]
[edited_by_G_addition]
A CDS that is edited.
An mRNA that is edited.
A transcript that is edited.
A transcript that has been edited by A to I substitution.
A locatable feature on a transcript that is edited.
Edited mRNA sequence mediated by a single guide RNA (SO:0000602).
Edited mRNA sequence mediated by two or more overlapping guide RNAs (SO:0000602).
A transcript processing variant whereby the process of editing is disrupted with respect to the reference.
[eight_cutter_restriction_site; eight-cutter_restriction_site; 8-cutter_restriction_site]
A sequence variant with in the CDS that causes in frame elongation of the resulting polypeptide sequence at the C terminus.
A sequence variant with in the CDS that causes in frame elongation of the resulting polypeptide sequence at the N terminus.
A sequence variant with in the CDS that causes out of frame elongation of the resulting polypeptide sequence at the C terminus.
A sequence variant with in the CDS that causes out of frame elongation of the resulting polypeptide sequence at the N terminus.
An elongation of a polypeptide sequence deriving from a sequence variant extending the CDS.
An elongation of a polypeptide sequence at the C terminus deriving from a sequence variant extending the CDS.
An elongation of a polypeptide sequence at the N terminus deriving from a sequence variant extending the CDS.
A gene that is alternately spliced, but encodes only one polypeptide.
A gene that has multiple possible transcription start sites.
A gene that encodes more than one transcript.
A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different stop codons.
A gene that is alternately spliced, and encodes more than one polypeptide, that do not have overlapping peptide sequences.
A gene that is alternately spliced, and encodes more than one polypeptide.
A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences.
A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different start codons.
A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different start and stop codons.
[end_overlapping_gene]
A proviral gene with origin endogenous retrovirus.
Endogenous DNA sequence that are likely to have arisen from retroviruses.
Endogenous retrovirus (ERV) retrotransposons are abundant in the genomes of jawed vertebrates. Human ERVs (HERVs) are classified based on their homologies to animal retroviruses. Class I families are similar in sequence to mammalian Gammaretroviruses (type C) and Epsilonretroviruses (Type E). Class II families show homology to mammalian Betaretroviruses (Type B) and Deltaretroviruses (Type D). F-Class III families are similar to foamy viruses. Added as per GitHub Issue Request #488 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488)
An intron that spliced via endonucleolytic cleavage and ligation rather than transesterification.
A polypeptide region that targets a polypeptide to the endosome.
An attribute to describe a region that was modified in vitro.
[engineered_DNA]
An episome that is engineered. Requested by Lynn Crosby Jan 2006.
A gene that is engineered and foreign.
A region that is engineered and foreign.
A repetitive element that is engineered and foreign.
A transposable_element that is engineered and foreign.
A transposable_element that is engineered and foreign.
A fusion gene that is engineered.
A gene that is engineered.
A clone insert that is engineered.
A plasmid that is engineered.
A region that is engineered.
A rescue region that is engineered.
A tag that is engineered.
TE that has been modified by manipulations in vitro.
[enhanceosome]
A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. An enhancer may participate in an enhanceosome GO:0034206. A protein-DNA complex formed by the association of a distinct set of general and specific transcription factors with a region of enhancer DNA. The cooperative assembly of an enhanceosome confers specificity of transcriptional regulation. This comment is a place holder should we start to make cross products with GO.
[enhancer_attribute]
A binding site that, in the enhancer region of a nucleotide molecule, interacts selectively and non-covalently with polypeptide residues.
An enhancer bound by a factor.
An enhancer trap construct is a type of engineered plasmid which is designed to integrate into a genome and express a reporter when the expression from a basic minimal promoter is enhanced by genomic enhancer elements. Enhancer traps contain promoter elements and are not usually mutagenic.
A short ncRNA that is transcribed from an enhancer. May have a regulatory function.
An attribute describing the sequence of a transcript that has catalytic activity with or without an associated ribonucleoprotein. Do not use this for feature annotation. Use enzymatic_RNA (SO:0000372) instead.
An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. This was moved to be a child of transcript (SO:0000673) because some enzymatic RNA regions are part of primary transcripts and some are part of processed transcripts.
A gene that encodes an enzymatic RNA.
This attribute describes a gene where heritable changes other than those in the DNA sequence occur. These changes include: modification to the DNA (such as DNA methylation, the covalent modification of cytosine), and post-translational modification of histones.
A gene that is epigenetically modified.
A biological DNA region implicated in epigenomic changes caused by mechanisms other than changes in the underlying DNA sequence. This includes, nucleosomal histone post-translational modifications, nucleosome depletion to render DNA accessible and post-replicational base modifications such as cytosine modification. Moved from is_a biological_region (SO:0001411) to is_a regulatory_region (SO:0005836) on 11 Feb 2021. GREEKC members pointed out that this would be a more appropriate location. See GitHub Issue #530. 11 Feb 2021 updated definition along with addition of epigenomically_modified_region (SO:0002332). Epigenetically modified region is now not inherited while epigenomically modified region is not annotated as inherited. See GitHub Issue #532 and issue #534.
A plasmid that may integrate with a chromosome.
Epoxyqueuosine is a modified 7-deazoguanosine.
A C-terminal tetrapeptide motif that mediates retention of a protein in (or retrieval to) the endoplasmic reticulum. In mammals the sequence is KDEL, and in fungi HDEL or DDEL.
A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.
A match against an EST sequence.