ENA FTP Structure
The root of the ENA FTP server is at:
This server contains short pieces of assembled/annotated sequence dating back
to the early 80s, as well as larger scale data types including genome
assemblies, MAG/SAG assemblies, and transcriptome sequence assemblies.
This server does not contain other data types such as raw reads, which can be
found in the SRA FTP server.
It is expected that users will find relevant data through the various search
interfaces available from ENA, rather than directly interacting with and
navigating the FTP server.
This page therefore does not provide an exhaustive explanation of the server’s
structure, but serves to briefly describe what you will find in some of its
most important subdirectories.
For some, but not all categories, suppressed data is kept separate from public.
Most data is available in EMBL flat file format,
with some also available as FASTA.
Information on the INSDC feature table format, with versions dating back to 1995
An XML file containing cumulative information on all genome collection accessions
Plaintext reports on the content of genome assemblies, organised by genome collection accession (GCA)
Whole genome shotgun contig-level assembly, one multi-record flatfile and one multi-FASTA file per sequencing project
Transcriptome shotgun assembly records, one multi-record flatfile and one multi-FASTA file per sequencing project
Generalised sequence category, organised by sequence type and taxon class
Protein-coding sequences, organised by sequence type and accession number
Non-protein-coding sequences, organised by sequence type and accession number
Ribosomal RNA sequences, organised by sequence type and accession number
Targeted locus study sequences, one multi-record flatfile and one multi-FASTA file per sequencing project