Annotation Checklists

Checklists for some commonly submitted types of targeted sequence have been defined. These are more convenient than manually preparing a flat file and should be used where applicable.

Please note that none of the information here is relevant to submission of annotated genome assemblies. For information on this, please see our page How To Submit Genome Assemblies.

There are several categories of checklist:

Frequently Used Checklists

Name Checklist ID Definition
rRNA Gene ERT000002
For ribosomal RNA genes from prokaryotic, nuclear or
organellar DNA. All rRNAs are considered partial.
Single CDS
genomic DNA
For complete or partial coding sequence (CDS) derived
from genomic DNA. This checklist will not accept
segmented genes (i.e., with intron regions) so should
be used for prokaryotic, organellar genes or for
submitting a single exon.
Single CDS mRNA ERT000006
For complete or partial single coding sequence (CDS)
derived from mRNA (via cDNA).

MHC gene 1 exon ERT000030
For partial MHC class I or II antigens containing
one exon ONLY.
MHC gene 2 exons ERT000036
For partial MHC class I or II antigens containing two
exons ONLY. An intron feature should only be used when
the intron region has actually been sequenced. If the
intron has not been sequenced, or only partially
sequenced, please fill the non-sequenced gap with 100 Ns.
ncRNA ERT000042
For non-coding RNA (ncRNA) transcripts or single-exon
genes of prokaryotic or eukaryotic origin with the
exception of the ribosomal RNA (rRNA) and transfer RNA
Satellite DNA ERT000039
For submission of Satellites, Microsatellites and
Minisatellites. Complete or partial single polymorphic
locus present in nuclear and organellar DNA that
consists of short sequences repeated in tandem arrays.
Mobile Element ERT000056
For the submission of a single complete or partial
mobile element. This checklist captures the mobile
element feature but does not allow for granular
annotation of component parts, such as coding regions,
repeat regions and miscellaneous features within the
mobile element itself. If precise annotation or
translation is required, please use an alternative
submission route.
Gene Promoter ERT000054
For submission of uni- or bi-directional gene promoter
regions. Please note that CDS is not annotated; if you
wish to include the start of the coding region(s),
please leave a comment with the coordinates of the
start site(s).

Marker Sequence Checklists

Name Checklist ID Definition
COI gene ERT000020
For mitochondrial cytochrome oxidase subunit 1 genes.
ITS rDNA ERT000009
For ITS rDNA region. This checklist allows generic
annotation of the ITS components (18S rRNA, ITS1,
5.8S rRNA, ITS2 and 28S rRNA). For annotation of the
rRNA component only, please use the rRNA gene checklist.
trnK-matK locus ERT000032
For complete or partial matK gene within the
chloroplast trnK gene intron.
For the submission of the following markers: actin
(act), tubulin (tuba or tubb), calmodulin (CaM), RNA
polymerase II large subunits (RPB1 and RPB2),
translation elongation factor 1-alpha (tef1a),
glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and
histone 3 (H3) where the intron/exon boundaries are
not known.
For the submission of multi-locus markers (e.g., tRNA
+ CDS + rRNA) from in vivo gemomic DNA. This checklist
provides a simple submission process fororganellar or
nuclear regions containing multiple genes. For example,
a region containing coding genes, rRNA genes and tRNA
genes. Please note that individual feature annotation
is not possible with this checklist.
D-Loop ERT000034
For mitochondrial D-loop (control region) sequences.
All D-loops are considered partial.
Spacer, IGS
For intergenic spacer (IGS) sequences between
neighbouring genes (e.g. psbA-trnH IGS, 16S-23S rRNA
IGS). Inclusion of the flanking genes is allowed.
Gene intron ERT000037
For complete or partial single gene intron.
Spacer (ETS)
For submission of External Transcribed Spacer (ETS)
regions of the eukaryotic rDNA transcript; a region
often used to study intrageneric relationships.
Spacer Region
For submission of the 16S-23S rRNA intergenic spacer
region: the transcribed spacer between the 16S rRNA
and 23S rRNA genes of rRNA operons, found in
prokaryotes and organelles.

Virus-Specific Checklists

Name Checklist ID Definition
Single Viral
For complete or partial single coding sequence (CDS)
from a viral gene.Please do not use for peptides
processed from polyproteins or proviral sequences,
as these are all annotated differently.
For complete or partial viral polyprotein genes where
the mature peptide boundaries remain undefined. This
template is not suitable for proviral sequences. If
the sequences contain ribosomal frameshifts, please
contact us.
ssRNA(-) Viral
copy RNA
For complete or partial viral copy RNA (cRNA)
sequences, complementary to ssRNA(-) virus genomes.
Only one CDS can be added; further CDS information
should be provided in the curator comments section.
Region (UTR)
For complete or partial untranslated region (UTR) or
nontranslated region (NTR) found at the termini of
viral genomes. Please do not use this checklist for
submitting virus genomes or viral coding genes.
For submission of circular single stranded DNA
alphasatellite sequences associated with Begomovirus,
Babuvirus and Nanovirus.
For submission of circular single stranded DNA
betasatellite sequences of the Begomovirus genus.

Plant Viroid ERT000031
For complete circular ssRNA plant viroid sequences.
Please do not use for other circular viruses.

Large-Scale Data Checklists

Name Checklist ID Definition
Tag (EST)
For submission of Sanger-sequenced Expressed Sequence
Tags (ESTs). ESTs are short transcripts ~500-800 bp
long usually of low quality as they are the result
of only single pass reads. No feature annotation is
recorded on ESTs.
Site (STS)
For submission of Sequence Tagged Sites (STS). The
Sequence Tagged Site (STS) is a relatively short,
easily PCR-amplified sequence (200 to 500 bp) which
can be specifically amplified by PCR and detected in
the presence of all other genomic sequences and whose
location in the genome is mapped.
Genome Survey
Sequence (GSS)
For submission of Genome Survey Sequences (GSS). These
are short DNA sequences which inlude: random single
pass genome survey sequences, single pass reads from
cosmid/BAC/YAC ends (may be chromosome specific), exon
trapped genomic sequences, Alu PCR sequences and
transposon-tagged sequences.