How to Access ENA Programmatically

There are a number of REST APIs available for programmatic access of the European Nucleotide Archive. These enable programmatic access to the functionality of the ENA Advanced Search as well as direct download of ENA records and associated files.

Please see the relevant guides below for examples and tutorials on ENA programmatic data access and retrieval.

Perform Searches

All functionalities of the ENA Advanced Search can be performed programmatically using a combination of the ENA Portal API and the ENA Browser API. You can download the API docs for the Portal API here and the Browser API here.

You can further explore related records outside of the European Nucleotide Archive by programmatically accessing the ENA Cross Reference Service.

For examples and tutorials on how to use these APIs, please see the guidelines below:

Retrieve and Download Records

All public records within ENA are available to retrieve from the ENA Browser API so records can be programmatically downloaded directly from the API. Associated files can be downloaded using FTP or Aspera protocol.

For a quick summary of metadata and file retrieval locations of records, you can use the ENA file reports.

For further simplicity, enaBrowserTools can be downloaded and run locally on the command line to fetch files associated with records by accession. It can also be used to bulk download records related to a specified Sample or Study.

For examples and tutorials on how to use the Browser API, file reports and enaBrowserTools, please see the guidelines below:

Access the CRAM Reference Registry

The CRAM reference registry provides access to reference sequences used in CRAM files. Retrieval of reference sequences from the CRAM reference registry is provided by MD5 or SHA1 checksum through the endpoints documented in the CRAM reference registry API.

CRAM Format

CRAM is a sequencing read file format that is highly space efficient by using reference-based compression of sequence data and offers both lossless and lossy modes of compression. The format specification for CRAM is maintained by the Global Alliance for Genomics and Health (GA4GH) whose members provide multiple implementations and coordinate future specification changes.

The CRAM reference registry is used by GA4GH Samtools.

CRAM Reference Registry reverse proxy

To reduce network traffic originating from the use of the CRAM Reference Registry we recommend using locally cached reference sequences. In addition to local caches supported by Samtools, it is possible to cache sequences using an HTTP proxy.

In the tutorial below, the Squid is used as a reverse proxy to cache reference sequences retrieved from the CRAM Reference Registry: