Sequence Data Classes

The ENA holds different data classes of nucleotide sequence which fall under the three tiers of ENA: reads, assembly and annotation within the Sequence domain. This Sequence domain is separate to the standard Reads or Assembly domains listed in the General Guide. Sequence records often represent specific areas of genetic interest as opposed to capturing the whole genomic material of an organism.

Sequence records can be specific coding/non-coding regions derived from an annotated submission, submissions of individual targeted sequences of interest, or high-level assembly sequences such as scaffolds or chromosomes.

Sequence records are all available in EMBL (TEXT) or FASTA format.

Data Class

Definition

Example

EST

A record representing raw expressed sequence tag sequence
data (no qualities) and sample/library information

AL022645

GSS

A record representing genome survey sequence; single
pass, single direction sequence

AG007113

STS

A record representing a sequence tagged site

AL022542

CON

A record representing high level (scaffold upwards)
assembly information, constructed sequence and optional
annotation.

DS830848

TSA

A Transcriptome Shotgun Assembly record

EZ000160

HTC

A record representing high throughput assembled
transcriptomic sequence and optional annotation

AL122108

HTG

A record representing high throughput assembled genomic
sequence and optional annotation

AC011759

STD

A record representing standard targeted annotated
assembled sequence or derived annotation

AJ005668

PAT

A record representing a sequence associated with a patent
process

A77200

WGS

A record representing an annotated region (coding or non
coding) of a Whole Genome Shotgun contig level assembly

KXS48886