ENA FTP Structure
The root of the ENA FTP server is at:
This server contains short pieces of assembled/annotated sequence dating back
to the early 80s, as well as larger scale data types including genome
assemblies, MAG/SAG assemblies, and transcriptome sequence assemblies.
This server does not contain other data types such as raw reads, which can be
found in the SRA FTP server.
It is expected that users will find relevant data through the various search
interfaces available from ENA, rather than directly interacting with and
navigating the FTP server.
This page therefore does not provide an exhaustive explanation of the server’s
structure, but serves to briefly describe what you will find in some of its
most important subdirectories.
For some, but not all categories, suppressed data is kept separate from public.
Most data is available in EMBL flat file format,
with some also available as FASTA.
Information on the INSDC feature table format, with versions dating back to 1995
An XML file containing cumulative information on all genome collection accessions
Plaintext reports on the content of genome assemblies, organised by genome collection accession (GCA)
Whole genome shotgun contig-level assembly. Has for each sequencing project: the master record in EMBL flatfile format,
a gzipped file with all the sequences in flatfile format (e.g. AAAA02.dat.gz), and the same sequences in FASTA format (e.g. AAAA02.fasta.gz).
Transcriptome shotgun assembly records. Similar to WGS files.
Targeted locus study sequences. Similar to WGS & TSA files.
Generalised sequence category, organised by sequence type and taxonomic division.
Protein-coding sequences, organised by sequence type and taxonomic division.
Non-protein-coding sequences, organised by sequence type and taxonomic division. Similar file structure to Coding.
Ribosomal RNA sequences, organised by sequence type and taxonomic division. Similar file structure to Coding.