Sequence Data Classes
The ENA holds different data classes of nucleotide sequence which fall under the three tiers of ENA: reads, assembly and annotation within the Sequence domain. This Sequence domain is separate to the standard Reads or Assembly domains listed in the General Guide. Sequence records often represent specific areas of genetic interest as opposed to capturing the whole genomic material of an organism.
Sequence records can be specific coding/non-coding regions derived from an annotated submission, submissions of individual targeted sequences of interest, or high-level assembly sequences such as scaffolds or chromosomes.
Sequence records are all available in EMBL (TEXT) or FASTA format.
Data Class |
Definition
|
Example |
EST |
A record representing raw expressed sequence tag sequence
data (no qualities) and sample/library information
|
|
GSS |
A record representing genome survey sequence; single
pass, single direction sequence
|
|
STS |
A record representing a sequence tagged site
|
|
CON |
A record representing high level (scaffold upwards)
assembly information, constructed sequence and optional
annotation.
|
|
TSA |
A Transcriptome Shotgun Assembly record
|
|
HTC |
A record representing high throughput assembled
transcriptomic sequence and optional annotation
|
|
HTG |
A record representing high throughput assembled genomic
sequence and optional annotation
|
|
STD |
A record representing standard targeted annotated
assembled sequence or derived annotation
|
|
PAT |
A record representing a sequence associated with a patent
process
|
|
WGS |
A record representing an annotated region (coding or non
coding) of a Whole Genome Shotgun contig level assembly
|