How to Search for Cross References in ENA Programmatically

The ENA Xref service holds cross-references to a number of external data resources linked to ENA records.

The update and frequency of each source is dependent on their own release cycle and/or internal processes, with ENA supporting updates as frequently as once a week.

These cross-references can be explored programmatically using the Xref API which is documented with a Swagger interface.

This guide is not extensive and is designed to introduce you to some example uses for the Xref API which can be used as a platform for you to explore the API and service further.

Most of these examples use the ‘tsv’ format result but this can be swapped for ‘json’ if that is preferable.

Display List of All Cross Reference Sources

To get a good overview of what is included in the cross-reference service. You can first access the full list of cross-reference ‘Sources’ registered with ENA. These sources are the external data resources which are linked to ENA records. You can use the following endpoints to do this:

https://www.ebi.ac.uk/ena/xref/rest/tsv/source
https://www.ebi.ac.uk/ena/xref/rest/json/source

First 10 resulting cross-reference Sources as a TSV:

Source       Description
ArrayExpress ArrayExpress experiment
BCCM/LMBP    BCCM/LMBP Plasmid Collection
BlobToolKit  BlobToolKit: Toolkit for genome assembly QC
CABRI        CABRI
CCAP Culture Collection of Algae and Protozoa
Citation     Citation
COMPARE-RefGenome    Reference Genome as provided by COMPARE
dictyBase    Dictyostelid genomics
Ensembl      Genome (EnsEMBL)
Ensembl-Gn   EnsEMBL Genes

continued....

First 2 resulting cross-reference Sources in JSON:

[ {
  "Description" : "ArrayExpress experiment",
  "HomePage" : "https://www.ebi.ac.uk/arrayexpress/",
  "LastUpdated" : "2020-01-11 05:01:04.066495",
  "Source" : "ArrayExpress"
}, {
  "Description" : "BCCM/LMBP Plasmid Collection",
  "HomePage" : "http://www.genecorner.ugent.be/",
  "LastUpdated" : "2017-06-03 18:19:30.764629",
  "Source" : "BCCM/LMBP"
}, {

continued....

In the above example, the TSV provides more direct readability but the JSON format provides additional information on the date the cross-references for that Source were most recently updated. It can be worth exploring both the ‘tsv’ and ‘json’ endpoints available before deciding what is most useful for your particular use-case.

In addition to providing an overview of the cross-reference service, this endpoint is useful for determining the Source name for any sources you may want to explore further.

Look up Cross References for a Source

Once you have determined what Source you would like to search for, you can perform a search against this Source. For example, to fetch records that have a cross-reference registered with MGnify (EMBL-EBI’s metagenomic data analysis service), you could look up the following:

https://www.ebi.ac.uk/ena/xref/rest/tsv/search?source=MGnify&limit=100

This results in a tsv of all records that have a cross-reference with MGnify. The cross-reference service provides the url to the cross-reference source (in this case MGnify’s website) as well as the record in ENA.

This example is limited to 100 records. By using the ‘limit’ and ‘offset’ options, you can retrieve the data in batches. By default the limit is set to 100,000 records. You can set the limit to 0 to fetch all the records.

Narrow Down a Search By Target Record Type

You can narrow down cross-reference searches further to only return records of a certain Type. For example, you may want to search specifically for sample records which are linked to the MGnify service.

Firstly, you may want to determine what Targets are available and how they are named. To list the full list of Target options and Target names, you can use the following endpoint:

https://www.ebi.ac.uk/ena/xref/rest/tsv/target

Result:

Target       Description
analysis     Nucleotide sequence analyses
assembly     Genome assemblies
coding       Assembled and annotated protein-coding sequences
experiment   Read domain experiments
noncoding    Assembled and annotated non-coding sequences
run  Read domain runs
sample       Samples
sequence     Assembled and annotated nucleotide sequences
study        Studies
taxon        NCBI Taxonomy
trace        Capillary traces in Trace Archive
wgsmaster    WGS and TSA masters

Here we can see that Samples are determined by the target ‘sample’. Now, you can narrow down your previous search:

https://www.ebi.ac.uk/ena/xref/rest/tsv/search?source=MGnify&target=sample&limit=100

Look up Cross References for a Record

As opposed to looking for cross-references by the registered service, you may want to look up all cross-references for a particular ENA Record. To do this, you can also perform a cross-reference search using an INSDC accession:

https://www.ebi.ac.uk/ena/xref/rest/tsv/search?accession=AY772730

Result:

Source       Source primary accession        Source secondary accession      Source url      Target  Target primary accession        Target secondary accession      Target url
COMPARE-RefGenome    NLV/GII/Neustrelitz260/2000/DE                  sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC1393082      16517856        http://europepmc.org/abstract/PMC/PMC1393082    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC1594604      16891526        http://europepmc.org/abstract/PMC/PMC1594604    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC2828081      17953089        http://europepmc.org/abstract/PMC/PMC2828081    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC2897520      20484606        http://europepmc.org/abstract/PMC/PMC2897520    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC2919043      20554772        http://europepmc.org/abstract/PMC/PMC2919043    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC3096948      21524296        http://europepmc.org/abstract/PMC/PMC3096948    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC3110387      21686127        http://europepmc.org/abstract/PMC/PMC3110387    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC3187498      21849454        http://europepmc.org/abstract/PMC/PMC3187498    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC3367634      16485473        http://europepmc.org/abstract/PMC/PMC3367634    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC3493335      22943503        http://europepmc.org/abstract/PMC/PMC3493335    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC3695492      23630317        http://europepmc.org/abstract/PMC/PMC3695492    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC4298492      24989606        http://europepmc.org/abstract/PMC/PMC4298492    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC5388089      28181902        http://europepmc.org/abstract/PMC/PMC5388089    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC5746213      29284004        http://europepmc.org/abstract/PMC/PMC5746213    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC5874246      29593246        http://europepmc.org/abstract/PMC/PMC5874246    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC5911914      25946552        http://europepmc.org/abstract/PMC/PMC5911914    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC6160709      29992776        http://europepmc.org/abstract/PMC/PMC6160709    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC6511519      30531093        http://europepmc.org/abstract/PMC/PMC6511519    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC7011714      31483239        http://europepmc.org/abstract/PMC/PMC7011714    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC7160966      32322405        http://europepmc.org/abstract/PMC/PMC7160966    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730
EuropePMC    PMC7165577      16629981        http://europepmc.org/abstract/PMC/PMC7165577    sequence        AY772730                https://www.ebi.ac.uk/ena/browser/view/AY772730

Expanding metadata

In some cases, the cross-reference registered may have additional metadata. For example, cross-references registered with the source COMPARE-RefGenome.

To view this, add “expanded=true”:

https://www.ebi.ac.uk/ena/xref/rest/tsv/search?source=COMPARE-RefGenome&accession=AY772730&expanded=true

Result:

Source       Source primary accession        Source secondary accession      Source url      Target  Target primary accession        Target secondary accession      Target url      Family  Genus   species 1st below- species level        2nd below- species level        3rd below-species level Aggregated taxonomic name       genome
COMPARE-RefGenome    NLV/GII/Neustrelitz260/2000/DE                  sequence        AY772730                https://www.ebi.ac.uk/ena/data/view/AY772730    Caliciviridae   norovirus       GII     P15, 16                 NoV/GII.P16/GII.16      complete