This document describes blast4, an ASN.1 interface to BLAST. The National Center for Biotechnology Information provides free BLAST services to the public using this interface (over HTTP) and others. NCBI's BLAST source code is in the public domain, so other organizations may choose to run their own BLAST servers.
The functionality provided by this interface is similar to that provided by the URL API. Either interface will work for many applications, but application programmers may find this interface to be more convenient.
For more information on using NCBI's public BLAST servers using this interface, please refer to Appendix 2: Using NCBI's BLAST Servers.
We welcome your suggestions, comments, and questions about this specification. Please email them to us at toolbox@ncbi.nlm.nih.gov.
blast4 clients send Blast4-request
objects and receive
Blast4-reply
objects. Each request is answered with one reply. The
particular encoding used for requests and replies depends on the communication
mechanism used and is not part of this specification. Depending on the
communication mechanism used, a session may consist of one request and one reply
or of multiple requests and their replies.
A Blast4-request
consists of an optional ident
,
which identifies the application sending the request, and a body
,
which contains one of several specific requests:
Blast4-request ::= SEQUENCE { ident VisibleString OPTIONAL, body Blast4-request-body } Blast4-request-body ::= CHOICE { finish-params Blast4-finish-params-request, get-databases NULL, get-matrices NULL, get-parameters NULL, get-programs NULL, get-search-results Blast4-get-search-results-request, get-sequences Blast4-get-sequences-request, queue-search Blast4-queue-search-request }
The structure of Blast4-reply
is similar:
Blast4-reply ::= SEQUENCE { errors SEQUENCE OF Blast4-error OPTIONAL, body Blast4-reply-body } Blast4-reply-body ::= CHOICE { finish-params Blast4-finish-params-reply, get-databases Blast4-get-databases-reply, get-matrices Blast4-get-matrices-reply, get-parameters Blast4-get-parameters-reply, get-programs Blast4-get-programs-reply, get-search-results Blast4-get-search-results-reply, get-sequences Blast4-get-sequences-reply, queue-search Blast4-queue-search-reply }
errors
contains any informational, warning, or error messages
related to the processing of the request. Warnings indicate that the server
processed the request successfully, but that the results may be different than
the user anticipated. Errors indicate that the server was unable, in whole or in
part, to process the request.
Although there are many requests, the queue-search
and
get-search-results
requests are most important.
The queue-search
request is used to initiate a BLAST search:
Blast4-queue-search-request ::= SEQUENCE { program VisibleString, service VisibleString, queries Bioseq-set, subject Blast4-subject, paramset VisibleString OPTIONAL, params Blast4-parameters OPTIONAL }
program
and service
select a program in the BLAST
family and a service offered by that program. The complete set of programs and
services offered is returned by the get-programs
request.
queries
specifies the sequences to be searched.
subject
specifies the sequences against which the query
sequences will be searched. The sequences can be specified indirectly, through
databases
, or directly, through sequences
:
Blast4-subject ::= CHOICE { database VisibleString, sequences SEQUENCE OF Bioseq }
paramset
is used to include a named set of parameters. Including
a named set of parameters is equivalent to prepending the parameters in the set
to params
.
params
is used to override default parameter settings selected
by the server or parameter settings included via paramset
. There
are many parameters that can be specified, but none are required; the server
will attempt to set reasonable values for those that are not specified. For more
information, refer to Appendix 1: Search Parameters.
To learn more about default values set by the server, please refer to the
finish-params request.
The reply to a queue-search
request contains a request-id
,
which can be used later to retrieve the results of the search:
Blast4-queue-search-reply ::= SEQUENCE { request-id VisibleString OPTIONAL }
The get-search-results
request is used to retrieve the results
of a BLAST search:
Blast4-get-search-results-request ::= SEQUENCE { request-id VisibleString } Blast4-get-search-results-reply ::= SEQUENCE { alignments Seq-align-set OPTIONAL, phi-alignments Blast4-phi-alignments OPTIONAL, mask Blast4-mask OPTIONAL, ka-blocks SEQUENCE OF Blast4-ka-block OPTIONAL, search-stats SEQUENCE OF VisibleString OPTIONAL }
The elements returned are all optional; which ones are included depends on the particular search.
With the queue-search
request, the actual parameter values may
be different than those explicitly specified by the user; some may be read from
a parameter set (a paramset), while others may be set, by default, by the
server. For some applications and users, it may be important to know exactly
which values the server will use to execute a search. The finish-params
request takes arguments similar to those of the queue-search
request and returns a complete, or finished, set of parameters:
Blast4-finish-params-request ::= SEQUENCE { program VisibleString, service VisibleString, paramset VisibleString OPTIONAL, params Blast4-parameters OPTIONAL } Blast4-finish-params-reply ::= Blast4-parameters Blast4-parameters ::= SEQUENCE OF Blast4-parameter
The params
returned in the reply show the values of all search
parameters whose values are not zero, false, the empty string, or null.
The get-databases
request is used to enumerate the names of
databases known to the server. These names are the domain of the
subject.database
element of a queue-search
request.
Blast4-get-databases-reply ::= SEQUENCE OF Blast4-database-info Blast4-database-info ::= SEQUENCE { database Blast4-database, description VisibleString, last-updated VisibleString, total-length BigInt, num-sequences BigInt }
The get-matrices
request is used to enumerate the scoring
matrices known to the server. These are the matrices that can be specified by
name in the matrix search parameter.
Blast4-get-matrices-reply ::= SEQUENCE OF Blast4-matrix-id Blast4-matrix-id ::= SEQUENCE { residue-type Blast4-residue-type, name VisibleString }
The get-parameters
request is used to enumerate the search
parameters known by the server. This request is not intended to be initiated
directly by an end user, and the results are not intended to be displayed to an
end user; rather, this request helps clients to construct a user interface
dynamically so they can accomodate changes in the set of known search parameters
without modification. Clients are not required to use this request; they may
choose instead to support just those search parameters that are known when they
are written.
Blast4-get-matrices-reply ::= SEQUENCE OF Blast4-matrix-id Blast4-matrix-id ::= SEQUENCE { residue-type Blast4-residue-type, name VisibleString }
The get-paramsets
request is used to enumerate the named sets of
search parameters (the "parsets") known to the server. Parsets may make it
easier for users to tailor their searches to achieve specific results, but they
are never required.
Blast4-get-paramsets-reply ::= SEQUENCE OF Blast4-paramset-info Blast4-paramset-info ::= SEQUENCE { program VisibleString, name VisibleString }
Names of paramsets are unique (within the scope of a particular program) and are designed to be descriptive enough that no separate description is needed. Names are not required to follow any particular form.
The get-programs
request is used to enumerate the combinations
of program
and name
that may be specified in a
queue-search
request.
Blast4-program-info ::= SEQUENCE { program VisibleString, services SEQUENCE OF VisibleString }
Names of paramsets are unique (within the scope of a particular program) and are designed to be descriptive enough that users will be able to make reasonable choices based on program and name alone. Names are not required to follow any particular form and may be relatively long (perhaps 40 characters or more).
Search parameters are specified as name-value pairs:
Blast4-parameter ::= SEQUENCE { name VisibleString, value Blast4-value } Blast4-value ::= CHOICE { -- scalar types: big-integer BigInt, bioseq Bioseq, boolean BOOLEAN, cutoff Blast4-cutoff, integer INTEGER, matrix Blast4-matrix, real REAL, seq-align Seq-align, seq-id Seq-id, seq-loc Seq-loc, strand-type Blast4-strand-type, string VisibleString, -- lists of scalar types: big-integer-list SEQUENCE OF BigInt, bioseq-list SEQUENCE OF Bioseq, boolean-list SEQUENCE OF BOOLEAN, cutoff-list SEQUENCE OF Blast4-cutoff, integer-list SEQUENCE OF INTEGER, matrix-list SEQUENCE OF Blast4-matrix, real-list SEQUENCE OF REAL, seq-align-list SEQUENCE OF Seq-align, seq-id-list SEQUENCE OF Seq-id, seq-loc-list SEQUENCE OF Seq-loc, strand-type-list SEQUENCE OF Blast4-strand-type, string-list SEQUENCE OF VisibleString, -- imported collection types: bioseq-set Bioseq-set, seq-align-set Seq-align-set }
The following table shows the legal name
's and their
corresponding value
types:
parameter | type | description |
cutoff | cutoff | Only hits with e-values below the cutoff e-value or normalized scores above the cutoff score will be reported. |
db-genetic-code | integer | Code used to translate database from nucleotide to protein. See Table of Genetic Codes. |
culling | boolean | If true, hit lists are culled by keeping at most a certain number (hsp-range-max?) of HSP's in a range. (where is the size of the range set?) |
ungapped-alignment | boolean | If true, ungapped alignments are allowed. |
entrez-query | string | Used to construct an oid list. (which is used how?) |
i-thresh | integer | E-value threshold for inclusion in a PSI-BLAST multiple alignment. (See Gapped BLAST and PSI-BLAST: a new generation of protein database search programs and Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.) |
filter | string | A string that specifies when and how the query sequences are to be masked. Please refer to Appendix 3: Filter Strings. |
first-db-seq, final-db-seq |
integer | Only sequences with oid's between first-db-seq and final-db-seq will be searched. |
gap-open, gap-extend |
integer | Penalties applied for opening and extending gaps, respectively. The penalty for a gap of N residues is gap-open + N * gap-extend. Meaningful only if gapped-alignment is true. |
gi-list | integer_list | collection of sequences, specified by a list of gi numbers, against which queries will be compared. |
hitlist-size | integer | maximum number of database sequences for which to save hits. |
hsp-range-max | integer | maximum number of HSP's to save in any region. Meaningful only when culling is true. |
matrix | string, matrix | Substitution matrix containing similarity scores for all possible pairs of residues, specified by either name or value. (See Basic local alignment search tool and Table of Genetic Codes.) |
perc-ident | real | Only alignments in which at least this percentage of query residues are identical to the corresponding subject residues will be reported. |
nucl-penalty, nucl-reward |
integer | Penalty for a nucleotide mismatch and reward for a nucleotide match, respectively. Called the scores for mismatches and identities in DNA sequence comparisons in Basic local alignment search tool. |
phi-pattern | string | TBD |
pseudocount-weight | integer | Called "beta" in Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. (See Equation 5.) |
genetic-code | integer | Code used to translate query from nucleotide to protein. (See Table of Genetic Codes.) |
query-mask | seq_loc_list | Locations of query residues to be masked. Words spanning these locations are not included in the initial word table. With hard masking, these locations are also treated as unknown residues during extension. |
required-start, required-end |
integer | Only alignments which contain this region will be reported. |
searchsp-eff | real | User-specified search space; overrides value calculated by BLAST. |
strand-option | strand_type | Specifies whether to search the forward strand, the reverse strand, or both strands of the query sequences. |
template-length | integer | Length of a megablast discontiguous words template. Meaningful only for service=megablast. |
template-type | integer | Type of a megablast discontiguous words template. Legal values are TBD. Meaningful only for service=megablast. |
use-comp-based-stats | boolean | If true, uses composition-based statistics as described in Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements |
window-size | integer | Called "w" in Basic local alignment search tool, "W" in Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. |
word-threshold | integer | Called "T" in Basic local alignment search tool. |
NCBI provides a C++ wrapper to this interface
(ncbi::blast::CRemoteBlast
, as part of the xblast
library). The C++ wrapper automatically encodes requests, decodes replies,
and handles communication with the server. The C++ wrapper is the only
supported way to use this interface.
Filter strings consist of any number of the following options, separated by spaces or semicolons. For options that take parameters, parameters follow the letter which specifies the option and are separated, from the option letter and from each other, by spaces. Default values are shown in parentheses.
Options:
Examples: