Sequence Data Classes

The ENA holds different data classes of nucleotide sequence which fall under the three tiers of ENA: reads, assembly and annotation within the Sequence domain. This Sequence domain is separate to the standard Reads or Assembly domains listed in the General Guide. Sequence records often represent specific areas of genetic interest as opposed to capturing the whole genomic material of an organism.

Sequence records can be specific coding/non-coding regions derived from an annotated submission, submissions of individual targeted sequences of interest, or high-level assembly sequences such as scaffolds or chromosomes.

Sequence records are all available in EMBL (TEXT) or FASTA format.

Data Class	Definition	Example
EST	A record representing raw expressed sequence tag sequence data (no qualities) and sample/library information	AL022645
GSS	A record representing genome survey sequence; single pass, single direction sequence	AG007113
STS	A record representing a sequence tagged site	AL022542
CON	A record representing high level (scaffold upwards) assembly information, constructed sequence and optional annotation.	DS830848
TSA	A Transcriptome Shotgun Assembly record	EZ000160
HTC	A record representing high throughput assembled transcriptomic sequence and optional annotation	AL122108
HTG	A record representing high throughput assembled genomic sequence and optional annotation	AC011759
STD	A record representing standard targeted annotated assembled sequence or derived annotation	AJ005668
PAT	A record representing a sequence associated with a patent process	A77200
WGS	A record representing an annotated region (coding or non coding) of a Whole Genome Shotgun contig level assembly	KXS48886