IncaRNAfbinv offers an interactive environment for the inverse folding of RNA using a
fragment-based design approach.
The algorithm implemented in our web server is a significant extension of two
complementary methodologies: that described in Weinbrand et al. (Bioinformatics 2013,
29(22): 2938-2940) called RNAfbinv, together with Reinharz et al. (Bioinformatics 2013,
29(13): i308-i315) called incaRNAtion.
IncaRNAfbinv 2.0...
The server receives the desired secondary structure in dot bracket notation and additional parameters
to allow the user to control specific aspects of the design. The maximum length allowed is 500 bases.
The output includes the designed sequences and additional information such as structural distance to input,
minimum free energy (based on the Turner model, 2004), neutrality and more.
Input
-
Job name:
For personal use, can be used later on to search for old results (up to 1 week).
This parameter is optional.
-
e-mail:
Upon submission of the query form an e-mail will be sent to the given address which includes a
link
to the results page. Another mail will be sent again when the calculation is done and the results
are
ready for review.
Inserting your e-mail is optional but very much recommended for requests that include Mutational
robustness or require a large amount of designed sequences.
-
Target Energy: (Advanced Option)
Designed sequences will aim to fit the given minimum free energy. The calculation is done using RNAfold
From the Vienna RNA Package with the Turner energy
model, 2004. Target energy is an optional input.
-
Target Mutational robustness: (Advanced Option)
Designed sequences will aim to fit the given neutrality value [0,1]. Mutational robustness tests the base pair distance between the current
sequence to the fold of all the sequences that are a single point mutation away. This means that at
every iteration, to calculate this value, RNAfbinv must fold 3 * length(sequence) times. Using the
option slows down the calculation significantly and allows up to 300 max iterations and 50 output
sequences only.
-
Simulated Annealing Iterations: (Advanced Option)
The number of simulated annealing iterations done by RNAfbinv. By default 1000.
-
Consider sequence motifs (Advanced Option):
Considered consecutive lower case bases in the target sequence as a sequence motif. insertion and deletions
within a sequence motif incur increased penalties (See Design score -> Sequence alignment below).
The penalties are larger then single sequence deletion but are smaller then those that are connected to
structure. Note that sequence motif exist in the context of a single structural motif. This means that
a single consecutive lower case sequence spanning multiple motifs will be considered multiple sequence motifs.
-
Motif constraints:
Allows the user to select multiple motifs from the structure that will have a greater chance to
appear in the final result. The list of motifs will be filled upon insertion of a legal structure along side
an image of the structure generated by VARNA Visualization Applet for RNA.
RNAfbinv 1.0 supports single motif constraint.
-
Varying size limit:
Avilable in RNAfbinv 2.0. The new motif comparison method allows for results of varying length.
This options is used to control the maximum and minimum length as query size ± varying length limit.
-
Seed generation method:
Any RNAfbinv run can start using a seed, the following methods are supported by the web-server.
-
incaRNAtion
Unlike the original RNAfbinv, incaRNAtion uses a global search strategy. The adaptive sampling approach
simply generates sets of sequences by repeatedly running the stochastic backtrack algorithm.
incaRNAtion also allows the user to set a desired GC content distribution for the designed sequences.
Starting from incaRNAtion seeds allows RNAfbinv to reach the target structure in less iterations
and generates seeds with approximately the starting GC content.
If selected the user must set the GC content of the seed sequences
(Advanced option). It is also possible to set a maximum GC content error from the selection.
The GC error option only effects the incaRNAtion seed content.
-
Random initial guess
RNAfbinv starts from a totally random sequence.
-
User Defined
RNAfbinv starts from a sequence given by the user. The sequence must be the same length of the structure and
in the IUPAC sequence notation. The given sequence will be set as input for RNAfbinv
for all of the runs.
-
Number of output sequences
Select the number of output designed sequences.
Examples
We provide two simple examples. The examples are accessible in the selection box at the bottom of the input page.
Once an example is selected, press the set button to apply it to the input form.
-
Purine Riboswitch aptamer
Structure of the Guanine-binding riboswitch aptamer (Kim and Breaker, Biol. Cell, 2008).
((((((((...(.(((((.......))))).)........((((((.......))))))..))))))))
NNNNNNNNUNNNNNNNNNNNNNNNNNNNNNNNNUNNNUNNNNNNNNNNNNNNNNNNNNNNYNNNNNNNN
-
miRNA-146 precursor
Structure of miRNA-146 precursor (Krol et al., J. Biol. Chem., 2004)
((((..((((((((((((.((((((((............)))))))).)))))))))))).))))
Results
The results section contains the designed pattern list with predicted structure and additional information
stated below. The default sort is by Shapiro distance primary and BP distance secondary.
The results can be downloaded in excel format for further analysis.
-
Run no:
The RNAfbinv run number. Only signifies the order of completion.
-
Sequence:
The resultant designed sequence with its folding predicted structure below it.
-
Shapiro structure: (Coarse grained representation)
Fragment based structure for the predicted fold. Hairpins, interior loops, bulges, multi-loops and
stems are represented by (H), (I), (B), (M) and (S) respectively (Shapiro B.A., 1988)
-
Energy score (dG):
Given the designed sequence and predicted structure we calculate the free energy using the Turner
energy model, 2004. This value is in kcal/mol. The value is calculated using functions from the
Vienna RNA Package.
-
Mutational Robustness
Mutational robustness tests the base pair distance between the current
sequence to the fold of all the sequences that are a single point mutation away. This means that at
every iteration, to calculate this value, RNAfbinv must fold 3 * length(seqeuence) times. Using the
option slows down the calculation significantly and allows up to 300 max iterations and 50 output
sequences only.
-
BP distance
The base pair distance between the structure of the predicted fold for the resultant sequence to the
target structure given in the input. The calculation counts the number of indexes where a mismatch exists.
-
Shapiro distance
The distance between the Shapiro structure tree-graph representation of the predicted fold of the
result sequence to the Shapiro tree-graph representation for the target structure given in the input.
The calculation counts the number of insertion and deletions within the tree comparison.
-
Design Score
The design score RNAfbinv 2.0 generates for the resulted sequence.
The design score is described by the following equations for target tree T and candidate tree C:
Where TreeAlign(T,C) is defined as:
ChildCombination is the best TreeAlign over all ordered combinations of child motifs.
Del is the deletion cost of a single motif while δ is the deletion on the entire subtree (formula seen below)
Alignment score values:
-
Sequence alignment
Sequences are aligned per subsection (stem has 2, multi-loop has number of connected stems - 1, ect...)
1000 for deletion of non wild card ('N') nucleotide in target sequence, 1 for deletion of anything else.
Insertions are score with 1 like non 'N' deletion in target.
When the sequence motif feature is active, insertion and deletion penalties are increased to 20
when they are done within lower case sequence regions in the target.
The alignment objective function definition can be seen in the formula below:
-
Motif deletion
Target: 1000 for conserved motif / 100 for normal motif + sequence alignment score
Design: 100 + sequence alignment score
Deletion values are defined in the formula below:
-
Motif matching
Matching is allowed between different un-bounded motifs (Hairpin, buldge, multiloop, External sections and internal loop).
Bounded motifs (stems) are only matched to other bounded motifs. If a motif is set to be conserved it can only be compared to the exact same type.
The score matched the sequence alignment score for the two motifs.
-
GC% content
The percentage of GC in the result sequence.
-
Additional Information:
Fold Image:
A secondary structure image of the designed sequence and its predicted fold.
RNAfbinv2.0 also marks nucleotides aligned to query sequence since they are compared to the proper
motif and not to a static index (as in RNAfbinv 1.0)
This image is generated by VARNA Visualization Applet for RNA
Run Time
The following table shows run times (Log10 seconds) for three different structures under five GC% contents. Tests were made with default options.
The graph shows both seed generation times when using incaRNAtion seeds and RNAfbinv calculation, added together.
Structures:
-
miRNA-146 precursor
65 bases.
((((..((((((((((((.((((((((............)))))))).)))))))))))).))))
-
Purine Riboswitch aptamer
69 bases.
((((((((...(.(((((.......))))).)........((((((.......))))))..))))))))
-
Cobalamin Riboswitch aptamer
127 bases
..((((((((......(((.......))).....((((......))))...........................(((((.......))))).....(((.......))).......))))))))..
-
S14 Ribosomal RNA - Domain 2
361 bases
..........(((((...(.((((.(.(((.(((((((.(((((((((((....(((((((.....)))))))...)))))))))..)))))))))...(((((((((..(((((((((..((((((((...(((......)))......))))))))..))....(..((....)))))))))).)))))).)))...))))..))))....((((((...((...((((.........))))...))))))))..........((((((..((((((((((((((.....))))))))))))))...((..)))).....)))))))))).(((......((((....))))....)))