Sequence Data ClassesΒΆ

The ENA holds different data classes of nucleotide sequence which fall under the three tiers of ENA: reads, assembly and annotation within the Sequence domain. This Sequence domain is separate to the standard Reads or Assembly domains listed in the General Guide. Sequence records often represent specific areas of genetic interest as opposed to capturing the whole genomic material of an organism.

Sequence records can be specific coding/non-coding regions derived from an annotated submission, submissions of individual targeted sequences of interest, or high-level assembly sequences such as scaffolds or chromosomes.

Sequence records are all available in EMBL (TEXT) or FASTA format.

Data Class
Definition
Example
EST
A record representing raw expressed sequence tag sequence
data (no qualities) and sample/library information
AL022645
GSS
A record representing genome survey sequence; single
pass, single direction sequence
AG007113
STS
A record representing a sequence tagged site
AL022542
CON
A record representing high level (scaffold upwards)
assembly information, constructed sequence and optional
annotation.
DS830848
TSA
A Transcriptome Shotgun Assembly record
EZ000160
HTC
A record representing high throughput assembled
transcriptomic sequence and optional annotation
AL122108
HTG
A record representing high throughput assembled genomic
sequence and optional annotation
AC011759
STD
A record representing standard targeted annotated
assembled sequence or derived annotation
AJ005668
PAT
A record representing a sequence associated with a patent
process
A77200
WGS
A record representing an annotated region (coding or non
coding) of a Whole Genome Shotgun contig level assembly
KXS48886