Annotation Checklists

Checklists for some commonly submitted types of targeted sequence have been defined. These are more convenient than manually preparing a flat file and should be used where applicable.

Please note that none of the information here is relevant to submission of annotated genome assemblies. For information on this, please see our page How To Submit Genome Assemblies.

There are several categories of checklist:

Frequently Used Checklists

Name

Checklist ID

Definition

rRNA Gene

ERT000002

For ribosomal RNA genes from prokaryotic, nuclear or
organellar DNA. All rRNAs are considered partial.
Single CDS
genomic DNA

ERT000029

For complete or partial coding sequence (CDS) derived
from genomic DNA. This checklist will not accept
segmented genes (i.e., with intron regions) so should
be used for prokaryotic, organellar genes or for
submitting a single exon.

Single CDS mRNA

ERT000006

For complete or partial single coding sequence (CDS)
derived from mRNA (via cDNA).


MHC gene 1 exon

ERT000030

For partial MHC class I or II antigens containing
one exon ONLY.

MHC gene 2 exons

ERT000036

For partial MHC class I or II antigens containing two
exons ONLY. An intron feature should only be used when
the intron region has actually been sequenced. If the
intron has not been sequenced, or only partially
sequenced, please fill the non-sequenced gap with 100 Ns.

ncRNA

ERT000042

For non-coding RNA (ncRNA) transcripts or single-exon
genes of prokaryotic or eukaryotic origin with the
exception of the ribosomal RNA (rRNA) and transfer RNA
(tRNA).

Satellite DNA

ERT000039

For submission of Satellites, Microsatellites and
Minisatellites. Complete or partial single polymorphic
locus present in nuclear and organellar DNA that
consists of short sequences repeated in tandem arrays.

Mobile Element

ERT000056

For the submission of a single complete or partial
mobile element. This checklist captures the mobile
element feature but does not allow for granular
annotation of component parts, such as coding regions,
repeat regions and miscellaneous features within the
mobile element itself. If precise annotation or
translation is required, please use an alternative
submission route.

Gene Promoter

ERT000054

For submission of uni- or bi-directional gene promoter
regions. Please note that CDS is not annotated; if you
wish to include the start of the coding region(s),
please leave a comment with the coordinates of the
start site(s).

Marker Sequence Checklists

Name

Checklist ID

Definition

COI gene

ERT000020

For mitochondrial cytochrome oxidase subunit 1 genes.

ITS rDNA

ERT000009

For ITS rDNA region. This checklist allows generic
annotation of the ITS components (18S rRNA, ITS1,
5.8S rRNA, ITS2 and 28S rRNA). For annotation of the
rRNA component only, please use the rRNA gene checklist.

trnK-matK locus

ERT000032

For complete or partial matK gene within the
chloroplast trnK gene intron.
Phylogenetic
Marker

ERT000038

For the submission of the following markers: actin
(act), tubulin (tuba or tubb), calmodulin (CaM), RNA
polymerase II large subunits (RPB1 and RPB2),
translation elongation factor 1-alpha (tef1a),
glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and
histone 3 (H3) where the intron/exon boundaries are
not known.
Multi-Locus
Marker

ERT000058

For the submission of multi-locus markers (e.g., tRNA
+ CDS + rRNA) from in vivo gemomic DNA. This checklist
provides a simple submission process fororganellar or
nuclear regions containing multiple genes. For example,
a region containing coding genes, rRNA genes and tRNA
genes. Please note that individual feature annotation
is not possible with this checklist.

D-Loop

ERT000034

For mitochondrial D-loop (control region) sequences.
All D-loops are considered partial.
Intergenic
Spacer, IGS

ERT000035

For intergenic spacer (IGS) sequences between
neighbouring genes (e.g. psbA-trnH IGS, 16S-23S rRNA
IGS). Inclusion of the flanking genes is allowed.

Gene intron

ERT000037

For complete or partial single gene intron.
External
Transcribed
Spacer (ETS)

ERT000053

For submission of External Transcribed Spacer (ETS)
regions of the eukaryotic rDNA transcript; a region
often used to study intrageneric relationships.
16S-23S
Intergenic
Spacer Region

ERT000050

For submission of the 16S-23S rRNA intergenic spacer
region: the transcribed spacer between the 16S rRNA
and 23S rRNA genes of rRNA operons, found in
prokaryotes and organelles.

Virus-Specific Checklists

Name

Checklist ID

Definition

Single Viral
CDS

ERT000028

For complete or partial single coding sequence (CDS)
from a viral gene.Please do not use for peptides
processed from polyproteins or proviral sequences,
as these are all annotated differently.
Viral
Polyprotein

ERT000051

For complete or partial viral polyprotein genes where
the mature peptide boundaries remain undefined. This
template is not suitable for proviral sequences. If
the sequences contain ribosomal frameshifts, please
contact us.
ssRNA(-) Viral
copy RNA

ERT000052

For complete or partial viral copy RNA (cRNA)
sequences, complementary to ssRNA(-) virus genomes.
Only one CDS can be added; further CDS information
should be provided in the curator comments section.
Viral
Untranslated
Region (UTR)

ERT000060

For complete or partial untranslated region (UTR) or
nontranslated region (NTR) found at the termini of
viral genomes. Please do not use this checklist for
submitting virus genomes or viral coding genes.
Alphasatellite
sub-viral
particle

ERT000057

For submission of circular single stranded DNA
alphasatellite sequences associated with Begomovirus,
Babuvirus and Nanovirus.
Betasatellite
sub-viral
particle

ERT000047

For submission of circular single stranded DNA
betasatellite sequences of the Begomovirus genus.

Plant Viroid

ERT000031

For complete circular ssRNA plant viroid sequences.
Please do not use for other circular viruses.

Large-Scale Data Checklists

Name

Checklist ID

Definition

Expressed
Sequence
Tag (EST)

ERT000003

For submission of Sanger-sequenced Expressed Sequence
Tags (ESTs). ESTs are short transcripts ~500-800 bp
long usually of low quality as they are the result
of only single pass reads. No feature annotation is
recorded on ESTs.
Sequence
Tagged
Site (STS)

ERT000055

For submission of Sequence Tagged Sites (STS). The
Sequence Tagged Site (STS) is a relatively short,
easily PCR-amplified sequence (200 to 500 bp) which
can be specifically amplified by PCR and detected in
the presence of all other genomic sequences and whose
location in the genome is mapped.
Genome Survey
Sequence (GSS)

ERT000024

For submission of Genome Survey Sequences (GSS). These
are short DNA sequences which inlude: random single
pass genome survey sequences, single pass reads from
cosmid/BAC/YAC ends (may be chromosome specific), exon
trapped genomic sequences, Alu PCR sequences and
transposon-tagged sequences.