Webin-CLI Submission
Introduction
Submissions to ENA can be made using the interactive Webin submission service, programmatic Webin submission service and the Webin command line submission service.
This module gives an introduction to the Webin command line submission interface used to validate, upload and submit files to the European Nucleotide Archive (ENA) and will also link to where you can download it. Please note that unlike with other ENA submissions routes you may have used, you do not need to pre-upload your files when using Webin-CLI.
Webin-CLI is the only way to submit assembled genomes and transcriptomes.
Webin-CLI is available as a Docker image and as a Java jar file.
Download the program as a Java jar file
You can download Webin-CLI Java jar file from its GitHub repository. We recommend always using the latest version:
To get started with running Webin-CLI, download the .jar
file for whatever version you’re interested in.
If you have a GitHub account, you can use the ‘Watch’ button in the top right to always be notified of new releases.
Please note that Webin-CLI requires that you have Java installed before you can run it. You should have version 17 or newer installed, which can be downloaded from Java:
Webin-CLI has been tested against openjdk version 1.8.0_212. You are recommended to use equivalent or later version.
Download openapi JDKs from the below links:
https://adoptopenjdk.net/?variant=openjdk8&jvmVariant=hotspot
Download Oracle JREs from the below links:
Run the program as a Java jar file
The Webin command line submission interface is a self-executing Java jar file and is
run using the java
command:
java -jar webin-cli-<version>.jar <options>
for example:
java -jar webin-cli-1.7.3.jar <options>
The <version>
is the version number of the program.
Please note that the command must include the location of the jar file. For example, if you have it in your Downloads directory, the appropriate command on Mac/Linux on immediately opening the terminal would be:
java -jar Downloads/webin-cli-1.7.3.jar <options>
On Windows a backward slash is used instead of a forward slash:
java -jar Downloads\webin-cli-1.7.3.jar <options>
The command line <options>
are explained below.
Video Guide: Getting Started With Webin-CLI in Windows 10
Command Line Options
-context
: the submission type:-context genome
-context transcriptome
-context sequence
-context reads
-userName
: the Webin submission account name.-password
: the Webin submission account password.-centerName
: the center name of the submitter (mandatory for broker accounts).-manifest
: the manifest file name.-outputDir
: directory for output files.-inputDir
: input directory for files declared in manifest file.-validate
: validates the files defined in the manifest file.-submit
: validates and submits the files defined in the manifest file.-test
: use Webin test service instead of the production service. Please note that the Webin upload area is shared between test and production services, and that test submission files will not be archived.-ascp
: use Aspera Cli instead of FTP file transfer, if available. Aspera Cli should be installed and path to executable “ascp” should be in PATH variable.-version
: prints the version number of the program and exists.-help
: detailed information about the different options.
Submission Process
Please note that this section serves as a general overview of the use of Webin-CLI. You may prefer to find the page specific to your submission type using the links in the sidebar of this page. The following types of submissions are supported:
genome assemblies
transcriptome assemblies
annotated sequences
reads
The type of the submission is specified using the -context
command line option:
-context genome
-context transcriptome
-context sequence
-context reads
The following picture illustrates the stages of the submission process:
Bulk submissions for reads, unannotated genome assemblies, targeted sequences and taxonomic reference data can be made with Webin-CLI using the following tool:
Webin-CLI Bulk Submissions Tool
Stage 1: Pre-register Study and Sample
Each submission must be associated with a pre-registered study and a sample.
Stage 2: Prepare the Files
The set of files that are part of the submission must be specified using a manifest file.
The manifest file is specified using the -manifest <filename>
option.
Manifest File Format
The manifest file can be submitted as either a plain text file or a JSON file.
The manifest file contains metadata fields and file name fields.
Text Manifest File
The text manifest file format has two columns separated by a tab (or any whitespace characters):
Field name (first column): case insensitive field name
Field value (second column): field value
Examples of metadata fields are study and sample references:
STUDY Study accession or unique name (alias)
SAMPLE Sample accession or unique name (alias)
ANALYSIS_REF Comma separated list of analysis accession(s)
RUN_REF Comma separated list of run accession(s)
The file name field format is:
<file type> <file name>
An example of a file name field is:
FASTA genome.fasta.gz
For example, the following manifest file represents a genome assembly consisting of contigs provided in one fasta file:
STUDY TODO
SAMPLE TODO
ASSEMBLYNAME TODO
COVERAGE TODO
PROGRAM TODO
PLATFORM TODO
MINGAPLENGTH TODO
MOLECULETYPE genomic DNA
FASTA genome.fasta.gz
JSON Manifest File
The JSON manifest file format provides an option to prepare your submission in JSON. This can also be specifically used for more complex data types, such as multi-fastq submissions e.g. for single-cell data.
The manifest file has two columns separated by a colon:
Field name (first column): case insensitive field name
Field value (second column): field value
For example, the following manifest file represents a multi-fastq submission for single-cell data:
{
"study": "TODO",
"sample": "TODO",
"name": "TODO",
"platform": "TODO",
"instrument": "TODO",
"insert_size": "TODO",
"library_name": "TODO",
"library_source": "TODO",
"library_selection": "TODO",
"library_strategy": "TODO",
"fastq": [
{
"value": "single_cell_S1_L001_I1_001.fastq.gz",
"attributes": {
"read_type": "feature_barcode"
}
},
{
"value": "single_cell_S1_L001_R1_001.fastq.gz",
"attributes": {
"read_type": ["paired", "umi_barcode"]
}
},
{
"value": "single_cell_S1_L001_R2_001.fastq.gz",
"attributes": {
"read_type": "sample_barcode"
}
},
{
"value": "single_cell_S1_L001_R3_001.fastq.gz",
"attributes": {
"read_type": ["paired", "cell_barcode"]
}
}
]
}
Manifest File Types
Please refer to the more detailed documentation for supported file types for each submission.
Sequence based submission support the following formats:
FASTA: Sequences in fasta format
FLATFILE: Sequences in EMBL-Bank flat file format
The following additional formats are supported for genome assembly submissions:
AGP: Sequences in AGP format
CHROMOSOME_LIST: list of chromosomes
UNLOCALISED_LIST: list of unlocalised sequences
The following formats are supported for read submissions:
BAM: BAM file
CRAM: CRAM file
FASTQ: fastq file
Info File (for backward compability only)
You can also provide the metadata fields in a separate info file. The info file has the same format as the manifest file.
When a separate info file is used then the manifest file must contain the INFO
field pointing to the info file.
For example, the following manifest file represents a genome assembly consisting of contigs provided in one fasta file:
INFO assembly.info
FASTA genome.fasta.gz
Stage 3: Validate and Submit Files
You can validate your files using the -validate
command line option. Note that
the -submit
option must be used to submit the validated files.
You can submit your files using the -submit
command line option. Before
being submitted your files will be validated and uploaded to your
private Webin file upload area in webin.ebi.ac.uk.
Please refer to individual modules for validation rules.
Validation error reports are written into the <outputDir>/<context>/<name>/validate
directory.
The webin command line submission interface creates and submits XMLs for you. These XMLs and
the Receipt XML containing accession numbers are written into the <outputDir>/<context>/<name>/submit
directory. This directory also contains the file manifest that refers to the files that are
part of the submission.
The <outputDir>
can be specified using the -outputDir
option, the <context>
is
specified using the -context
option, and the <name>
is a submitter provided unique
name specified in the manifest
file.
Once the submission is complete an accession number is immediately returned to the submitter by the Webin command line submission interface. Please refer to individual modules for advice which long term stable accession numbers can be used in publications.
Output Directory Structure
An output directory can be specific to the Webin command line submission interface
using the -outputDir
option. This directory will have the following subdirectories:
<context>/<name>/validate
<context>/<name>/submit
If the -outputDir
option is not specified then the directory in which the
-manifest
file is used as the output directory.
The <context>
is the submission type provided using the -context
option
and the <name>
is the unique name provided in the manifest file.
The
validate
directory contains the validation reports created using the-validate
option.The
submit
directory contains the XMLs created by the-submit
option including the Receipt XML. This directory also contains the file manifest that refers to the files that are part of the submission.
Validation Reports
If the -validate
action fails for any reason then validation reports are written into directory:
<context>/<name>/validate
The validation reports correspond to the input files with an added suffix .report
.
For example, a validated fasta file assembly.fasta
will have a corresponding validation
report assembly.fasta.report
.
Messages which can’t be attributed to a specific input file will be written to both standard out and in the following file:
<context>/<name>/validate/webin-cli.report
Run the program using the Docker image
Webin-CLI is available as the enasequence/webin-cli
Docker image.
You can run the Webin-CLI docker image using docker
:
docker run enasequence/webin-cli
or using singularity
:
singularity run docker://enasequence/webin-cli
The required command line options are explained below. Please remember to mount local directories containing the files to submit so that they available to the running container.
Configuring Your Firewall For ENA Upload
Some users may encounter problems connecting to the ENA FTP service, a necessary step in Webin-CLI submission. A possible solution to this is to ensure that your firewall is configured appropriately to allow you to connect to this service:
Proxy Servers
If your organisation uses a https proxy you can set the following Java properties to instruct the webin-cli to use them:
https.proxyHost
https.proxyPort
For example:
java -Dhttps.proxyHost=proxy.com -Dhttps.proxyPort=8080 -jar webin-cli-<version>.jar <options>
Similarly, if your organisation uses a ftp proxy you can set the following properties:
ftp.proxyHost
ftp.proxyPort
For example:
java -Dftp.proxyHost=proxy.com -Dftp.proxyPort=8080 -jar webin-cli-<version>.jar <options>
Using Aspera Instead of FTP to Upload Files
By default the Webin command line interface will use FTP to upload files to the
webin.ebi.ac.uk server. Alternatively, you may use the Aspera protocol by installing
Aspera Cli and specifying the
-ascp
option. Aspera is a commercial file transfer protocol that may provide better
transfer speeds than FTP making it useful when uploading larger files.
Please note that that the folder containing the ascp
command line program
must be included in the PATH variable. The ascp
command can be found from
the cli/bin
directory of the downloaded and expanded Aspera Cli archive file.
Release policy
Webin-CLI uses standard three number semantic versioning.
Patch releases: the third digit is incremented by one after backward compatible bug fixes. For example, from 1.0.1 to 1.0.2.
Minor releases: the second digit is incremented by one after backward compatible new features. For example, from 1.0.1 to 1.1.0.
Major releases: changes that break backward compatibility result in the first digit being incremented. For example, from 1.0.1 to 2.0.0.
The definition of Webin-CLI backward compatibility is that there are no breaking changes to the command line usage or to the file formats.
All releases are made immediately after bugs have been fixed or new features have been added.
Releases are downloadable from: https://github.com/enasequence/webin-cli/releases.
After all releases, we will endeavour to contact affected submitters who previously were unable to complete their submissions.
Minor and Major releases will be announced to ena-announce@ebi.ac.uk mailing list.
After Minor or Major releases, submitters will be asked to migrate to use this (or higher) version after a transition period.
After a Minor release, we will give two weeks notice for submitters to migrate to the new (or higher) version.
After a Major release, we will give at least two months notice for submitters to migrate to the new (or higher) version.