How to Search for Cross References in ENA Programmatically

The ENA Xref service holds cross-references to a number of external data resources linked to ENA records.

The update and frequency of each source is dependent on their own release cycle and/or internal processes, with ENA supporting updates as frequently as once a week.

These cross-references can be explored programmatically using the Xref API which is documented with a Swagger interface.

This guide is not extensive and is designed to introduce you to some example uses for the Xref API which can be used as a platform for you to explore the API and service further.

Most of these examples use the ‘tsv’ format result but this can be swapped for ‘json’ if that is preferable.

Display List of All Cross Reference Sources

To get a good overview of what is included in the cross-reference service. You can first access the full list of cross-reference ‘Sources’ registered with ENA. These sources are the external data resources which are linked to ENA records. You can use the following endpoints to do this:

First 10 resulting cross-reference Sources as a TSV:

Source       Description
ArrayExpress ArrayExpress experiment
BCCM/LMBP    BCCM/LMBP Plasmid Collection
BlobToolKit  BlobToolKit: Toolkit for genome assembly QC
CCAP Culture Collection of Algae and Protozoa
Citation     Citation
COMPARE-RefGenome    Reference Genome as provided by COMPARE
dictyBase    Dictyostelid genomics
Ensembl      Genome (EnsEMBL)
Ensembl-Gn   EnsEMBL Genes


First 2 resulting cross-reference Sources in JSON:

[ {
  "Description" : "ArrayExpress experiment",
  "HomePage" : "",
  "LastUpdated" : "2020-01-11 05:01:04.066495",
  "Source" : "ArrayExpress"
}, {
  "Description" : "BCCM/LMBP Plasmid Collection",
  "HomePage" : "",
  "LastUpdated" : "2017-06-03 18:19:30.764629",
  "Source" : "BCCM/LMBP"
}, {


In the above example, the TSV provides more direct readability but the JSON format provides additional information on the date the cross-references for that Source were most recently updated. It can be worth exploring both the ‘tsv’ and ‘json’ endpoints available before deciding what is most useful for your particular use-case.

In addition to providing an overview of the cross-reference service, this endpoint is useful for determining the Source name for any sources you may want to explore further.

Look up Cross References for a Source

Once you have determined what Source you would like to search for, you can perform a search against this Source. For example, to fetch records that have a cross-reference registered with MGnify (EMBL-EBI’s metagenomic data analysis service), you could look up the following:

This results in a tsv of all records that have a cross-reference with MGnify. The cross-reference service provides the url to the cross-reference source (in this case MGnify’s website) as well as the record in ENA.

This example is limited to 100 records. By using the ‘limit’ and ‘offset’ options, you can retrieve the data in batches. By default the limit is set to 100,000 records. You can set the limit to 0 to fetch all the records.

Narrow Down a Search By Target Record Type

You can narrow down cross-reference searches further to only return records of a certain Type. For example, you may want to search specifically for sample records which are linked to the MGnify service.

Firstly, you may want to determine what Targets are available and how they are named. To list the full list of Target options and Target names, you can use the following endpoint:


Target       Description
analysis     Nucleotide sequence analyses
assembly     Genome assemblies
coding       Assembled and annotated protein-coding sequences
experiment   Read domain experiments
noncoding    Assembled and annotated non-coding sequences
run  Read domain runs
sample       Samples
sequence     Assembled and annotated nucleotide sequences
study        Studies
taxon        NCBI Taxonomy
trace        Capillary traces in Trace Archive
wgsmaster    WGS and TSA masters

Here we can see that Samples are determined by the target ‘sample’. Now, you can narrow down your previous search:

Look up Cross References for a Record

As opposed to looking for cross-references by the registered service, you may want to look up all cross-references for a particular ENA Record. To do this, you can also perform a cross-reference search using an INSDC accession:


Source       Source primary accession        Source secondary accession      Source url      Target  Target primary accession        Target secondary accession      Target url
COMPARE-RefGenome    NLV/GII/Neustrelitz260/2000/DE                  sequence        AY772730      
EuropePMC    PMC1393082      16517856    sequence        AY772730      
EuropePMC    PMC1594604      16891526    sequence        AY772730      
EuropePMC    PMC2828081      17953089    sequence        AY772730      
EuropePMC    PMC2897520      20484606    sequence        AY772730      
EuropePMC    PMC2919043      20554772    sequence        AY772730      
EuropePMC    PMC3096948      21524296    sequence        AY772730      
EuropePMC    PMC3110387      21686127    sequence        AY772730      
EuropePMC    PMC3187498      21849454    sequence        AY772730      
EuropePMC    PMC3367634      16485473    sequence        AY772730      
EuropePMC    PMC3493335      22943503    sequence        AY772730      
EuropePMC    PMC3695492      23630317    sequence        AY772730      
EuropePMC    PMC4298492      24989606    sequence        AY772730      
EuropePMC    PMC5388089      28181902    sequence        AY772730      
EuropePMC    PMC5746213      29284004    sequence        AY772730      
EuropePMC    PMC5874246      29593246    sequence        AY772730      
EuropePMC    PMC5911914      25946552    sequence        AY772730      
EuropePMC    PMC6160709      29992776    sequence        AY772730      
EuropePMC    PMC6511519      30531093    sequence        AY772730      
EuropePMC    PMC7011714      31483239    sequence        AY772730      
EuropePMC    PMC7160966      32322405    sequence        AY772730      
EuropePMC    PMC7165577      16629981    sequence        AY772730      

Expanding metadata

In some cases, the cross-reference registered may have additional metadata. For example, cross-references registered with the source COMPARE-RefGenome.

To view this, add “expanded=true”:


Source       Source primary accession        Source secondary accession      Source url      Target  Target primary accession        Target secondary accession      Target url      Family  Genus   species 1st below- species level        2nd below- species level        3rd below-species level Aggregated taxonomic name       genome
COMPARE-RefGenome    NLV/GII/Neustrelitz260/2000/DE                  sequence        AY772730          Caliciviridae   norovirus       GII     P15, 16                 NoV/GII.P16/GII.16      complete