Help

IncaRNAfbinv offers an interactive environment for the inverse folding of RNA using a fragment-based design approach.
The algorithm implemented in our web server is a significant extension of two complementary methodologies: that described in Weinbrand et al. (Bioinformatics 2013, 29(22): 2938-2940) called RNAfbinv, together with Reinharz et al. (Bioinformatics 2013, 29(13): i308-i315) called incaRNAtion.
IncaRNAfbinv 2.0...

The server receives the desired secondary structure in dot bracket notation and additional parameters to allow the user to control specific aspects of the design. The maximum length allowed is 500 bases. The output includes the designed sequences and additional information such as structural distance to input, minimum free energy (based on the Turner model, 2004), neutrality and more.

Input

  • Job name:

    For personal use, can be used later on to search for old results (up to 1 week). This parameter is optional.
  • e-mail:

    Upon submission of the query form an e-mail will be sent to the given address which includes a link to the results page. Another mail will be sent again when the calculation is done and the results are ready for review. Inserting your e-mail is optional but very much recommended for requests that include Mutational robustness or require a large amount of designed sequences.
  • Target structure:

    A sequence pattern based on the dot bracket notation (not including pseudoknots). This means legal characters are '.' to mark an unbounded base, '(' to mark first base in a base-pair and ')' to mark second base in a base pair (Or '<' and '>' respectively).
    Example:
    ((((((((...(.(((((.......))))).)........((((((.......))))))..))))))))
    Structure of the Guanine-binding riboswitch aptamer (Kim and Breaker, Biol. Cell, 2008).
  • Target sequence:

    A sequence pattern based on IUPAC sequence notation (not including 'x' and '-'). The sequence constraint is optional, if left empty then it will be replaced with 'N' x structure length. If used, sequence constraints must have the same length as target structure. Result sequences must fit this sequence pattern. The sequence pattern is rigid and attached to an index.

    Example:

    The following sequence was constructed to match the structure above:
    NNNNNNNNUNNNNNNNNNNNNNNNNNNNNNNNNUNNNUNNNNNNNNNNNNNNNNNNNNNNYNNNNNNNN
    Specific locations that are sequence conserved are constrained. Specifically these are the nucleic acids that interact with the purine ligand.
  • Target Energy: (Advanced Option)

    Designed sequences will aim to fit the given minimum free energy. The calculation is done using RNAfold From the Vienna RNA Package with the Turner energy model, 2004. Target energy is an optional input.
  • Target Mutational robustness: (Advanced Option)

    Designed sequences will aim to fit the given neutrality value [0,1]. Mutational robustness tests the base pair distance between the current sequence to the fold of all the sequences that are a single point mutation away. This means that at every iteration, to calculate this value, RNAfbinv must fold 3 * length(sequence) times. Using the option slows down the calculation significantly and allows up to 300 max iterations and 50 output sequences only.
  • Simulated Annealing Iterations: (Advanced Option)

    The number of simulated annealing iterations done by RNAfbinv. By default 1000.
  • Consider sequence motifs (Advanced Option):

    Considered consecutive lower case bases in the target sequence as a sequence motif. insertion and deletions within a sequence motif incur increased penalties (See Design score -> Sequence alignment below). The penalties are larger then single sequence deletion but are smaller then those that are connected to structure. Note that sequence motif exist in the context of a single structural motif. This means that a single consecutive lower case sequence spanning multiple motifs will be considered multiple sequence motifs.
  • Motif constraints:

    Allows the user to select multiple motifs from the structure that will have a greater chance to appear in the final result. The list of motifs will be filled upon insertion of a legal structure along side an image of the structure generated by VARNA Visualization Applet for RNA. RNAfbinv 1.0 supports single motif constraint.
  • Varying size limit:

    Avilable in RNAfbinv 2.0. The new motif comparison method allows for results of varying length. This options is used to control the maximum and minimum length as query size ± varying length limit.
  • Seed generation method:

    Any RNAfbinv run can start using a seed, the following methods are supported by the web-server.
    • incaRNAtion
      Unlike the original RNAfbinv, incaRNAtion uses a global search strategy. The adaptive sampling approach simply generates sets of sequences by repeatedly running the stochastic backtrack algorithm. incaRNAtion also allows the user to set a desired GC content distribution for the designed sequences. Starting from incaRNAtion seeds allows RNAfbinv to reach the target structure in less iterations and generates seeds with approximately the starting GC content.
      If selected the user must set the GC content of the seed sequences (Advanced option). It is also possible to set a maximum GC content error from the selection. The GC error option only effects the incaRNAtion seed content.
    • Random initial guess
      RNAfbinv starts from a totally random sequence.
    • User Defined
      RNAfbinv starts from a sequence given by the user. The sequence must be the same length of the structure and in the IUPAC sequence notation. The given sequence will be set as input for RNAfbinv for all of the runs.
  • Number of output sequences

    Select the number of output designed sequences.

Examples

We provide two simple examples. The examples are accessible in the selection box at the bottom of the input page. Once an example is selected, press the set button to apply it to the input form.
  • Purine Riboswitch aptamer

    Structure of the Guanine-binding riboswitch aptamer (Kim and Breaker, Biol. Cell, 2008).
    ((((((((...(.(((((.......))))).)........((((((.......))))))..))))))))
    NNNNNNNNUNNNNNNNNNNNNNNNNNNNNNNNNUNNNUNNNNNNNNNNNNNNNNNNNNNNYNNNNNNNN
  • miRNA-146 precursor

    Structure of miRNA-146 precursor (Krol et al., J. Biol. Chem., 2004)
    ((((..((((((((((((.((((((((............)))))))).)))))))))))).))))

Results

The results section contains the designed pattern list with predicted structure and additional information stated below. The default sort is by Shapiro distance primary and BP distance secondary. The results can be downloaded in excel format for further analysis.
  • Run no:

    The RNAfbinv run number. Only signifies the order of completion.
  • Sequence:

    The resultant designed sequence with its folding predicted structure below it.
  • Shapiro structure: (Coarse grained representation)

    Fragment based structure for the predicted fold. Hairpins, interior loops, bulges, multi-loops and stems are represented by (H), (I), (B), (M) and (S) respectively (Shapiro B.A., 1988)
  • Energy score (dG):

    Given the designed sequence and predicted structure we calculate the free energy using the Turner energy model, 2004. This value is in kcal/mol. The value is calculated using functions from the Vienna RNA Package.
  • Mutational Robustness

    Mutational robustness tests the base pair distance between the current sequence to the fold of all the sequences that are a single point mutation away. This means that at every iteration, to calculate this value, RNAfbinv must fold 3 * length(seqeuence) times. Using the option slows down the calculation significantly and allows up to 300 max iterations and 50 output sequences only.
  • BP distance

    The base pair distance between the structure of the predicted fold for the resultant sequence to the target structure given in the input. The calculation counts the number of indexes where a mismatch exists.
  • Shapiro distance

    The distance between the Shapiro structure tree-graph representation of the predicted fold of the result sequence to the Shapiro tree-graph representation for the target structure given in the input.
    The calculation counts the number of insertion and deletions within the tree comparison.
  • Design Score

    The design score RNAfbinv 2.0 generates for the resulted sequence.
    The design score is described by the following equations for target tree T and candidate tree C:

    Where TreeAlign(T,C) is defined as:

    ChildCombination is the best TreeAlign over all ordered combinations of child motifs. Del is the deletion cost of a single motif while δ is the deletion on the entire subtree (formula seen below)
    Alignment score values:
    • Sequence alignment
      Sequences are aligned per subsection (stem has 2, multi-loop has number of connected stems - 1, ect...) 1000 for deletion of non wild card ('N') nucleotide in target sequence, 1 for deletion of anything else. Insertions are score with 1 like non 'N' deletion in target.
      When the sequence motif feature is active, insertion and deletion penalties are increased to 20 when they are done within lower case sequence regions in the target.
      The alignment objective function definition can be seen in the formula below:

    • Motif deletion
      Target: 1000 for conserved motif / 100 for normal motif + sequence alignment score Design: 100 + sequence alignment score
      Deletion values are defined in the formula below:

    • Motif matching
      Matching is allowed between different un-bounded motifs (Hairpin, buldge, multiloop, External sections and internal loop). Bounded motifs (stems) are only matched to other bounded motifs. If a motif is set to be conserved it can only be compared to the exact same type. The score matched the sequence alignment score for the two motifs.
  • GC% content

    The percentage of GC in the result sequence.
  • Additional Information:

    Fold Image:
    A secondary structure image of the designed sequence and its predicted fold. RNAfbinv2.0 also marks nucleotides aligned to query sequence since they are compared to the proper motif and not to a static index (as in RNAfbinv 1.0) This image is generated by VARNA Visualization Applet for RNA

Run Time

The following table shows run times (Log10 seconds) for three different structures under five GC% contents. Tests were made with default options. The graph shows both seed generation times when using incaRNAtion seeds and RNAfbinv calculation, added together.

Structures:

  1. miRNA-146 precursor

    65 bases.
    ((((..((((((((((((.((((((((............)))))))).)))))))))))).))))
  2. Purine Riboswitch aptamer

    69 bases.
    ((((((((...(.(((((.......))))).)........((((((.......))))))..))))))))
  3. Cobalamin Riboswitch aptamer

    127 bases
    ..((((((((......(((.......))).....((((......))))...........................(((((.......))))).....(((.......))).......))))))))..
  4. S14 Ribosomal RNA - Domain 2

    361 bases
    ..........(((((...(.((((.(.(((.(((((((.(((((((((((....(((((((.....)))))))...)))))))))..)))))))))...(((((((((..(((((((((..((((((((...(((......)))......))))))))..))....(..((....)))))))))).)))))).)))...))))..))))....((((((...((...((((.........))))...))))))))..........((((((..((((((((((((((.....))))))))))))))...((..)))).....)))))))))).(((......((((....))))....)))