How to Download Data Files

Providing users with the ability to download submitted data for further analysis purposes is a key part of ENA’s mission. Files are therefore made available through a public FTP server. Here you can learn how this server is structured, and how to download read and analysis files.

FTP Structure

The root address of the FTP server containing all read and analysis data is:

ftp://ftp.sra.ebi.ac.uk/vol1/

Meanwhile, assembled and annotated sequence data can be found at:

ftp://ftp.ebi.ac.uk/pub/databases/ena/

Any file you download from ENA will come from one of these two FTP servers. Their content and structures are described in detail at the below pages:

Downloading Files

ENA provides numerous ways to access the data it hosts, suiting a range of use-cases and computational ability levels. These are described below, ranked from low to high, based on how much computational ability might be required:

Note

Most directories contain a ‘.md5’ file. You can calculate the MD5 value for a file you have downloaded and compare it with the relevant .md5 file to confirm it has been transferred in full.

Using ENA Browser

The ENA Browser is our website, from which you can get information about ENA, as well as accessing all the data we have public. Visit us here:

https://www.ebi.ac.uk/ena/browser/home

You can go to any accession by entering it into the ‘Enter accession’ box at the link above. If, for example, you see an ENA accession referenced in a paper, you can see the data for yourself in this way. Once there, you can download any associated files by clicking the relevant links. For more information on how to explore a record in ENA, please visit our guide on How to Explore an ENA Project

Using ENA FTP Downloader

The ENA FTP File Downloader is an application you can download from GitHub. Given an accession, this program will present a list of associated files you can download. Alternatively, you can provide a query from our Advanced Search API or Portal API to perform a bulk download of all files for a given set of criteria. Learn more about these APIs from our guide on How to Access ENA Programmatically.

Using Globus

Globus provides a more user-friendly, feature-rich directory interface for interacting with the FTP server. Files can be downloaded through Globus ‘Shared EMBL-EBI public endpoint’ endpoint from the ‘/gridftp/ena’ subfolder:

../_images/file-download-globus.png

Using enaBrowserTools

enaBrowserTools is a set of Python-based utilities which can be found here. These are simple-to-run scripts which allow accession-based data download commands with the option to create more complex commands. Read more about this page in the enaBrowserTools Guide.

Using wget

wget is a simple command line tool, ubiquitously available in Linux and Mac releases. A file can be downloaded with wget simply by specifying its location:

$ wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz

Using FTP Client

Command-line FTP clients allow you to interactively explore the FTP server and download data to your local computer. When asked for a username, use ‘anonymous’. When asked for a password, press the enter key to skip this.

ftp ftp.sra.ebi.ac.uk
Name: anonymous
Password:
ftp> cd vol1/fastq/ERR164/ERR164407
ftp> get ERR164407.fastq.gz

In the above example, the ‘cd’ command is used to ‘change directory’ to the required directory. Then, the ‘get’ command is used to specify the file of interest. At any time, you can use ‘ls’ to view the content of the current directory. The command ‘pwd’ can be used to identify what the current directory is.

Using Aspera

Aspera ascp command line client can be downloaded from Aspera. Please select the correct version for your operating system. The ascp command line client is distributed as part of the Aspera connect high-performance transfer browser plug-in.

Following are some examples of how Aspera may be used to download ENA data:

Unix

ascp -QT -l 300m -P33001 -i path/to/aspera/installation/etc/asperaweb_id_dsa.openssh \
era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz \
local/target/directory

Mac OSX

ascp -QT -l 300m -P33001 -i path/to/aspera/installation/asperaweb_id_dsa.openssh \
era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz \
local/target/directory

Windows

"%userprofile%\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp" ^
-QT -l 300m -P33001 -i ^
"%userprofile%\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh" ^
era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz ^
local\target\directory

Downloading Private Files

e.g. If you want to use aspera to download a non-public data file using datahub (dcc) authentication, provide the dcc username instead of era-fasp and you will be prompted for the password.

ascp -QT -l 300m -P33001 \
dcc_name@fasp.sra.ebi.ac.uk:/vol1/fastq/ERR327/009/ERR3278169/ERR3278169_1.fastq.gz \
local/target/directory

Downloading Assembled and Annotated Sequence Data

Files in public FTP folders can also be downloaded using Aspera.

e.g. a WGS sequence set like ftp://ftp.ebi.ac.uk/pub/databases/ena/wgs/public/wy/WYAA01.dat.gz

ascp -QT -l 300m -P33001 -i path/to/aspera/installation/asperaweb_id_dsa.openssh /
fasp-ebi@fasp.ebi.ac.uk:databases/ena/wgs/public/wy/WYAA01.dat.gz local/target/directory