Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

Department of Veterinary and Animal Sciences

Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. / Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.; Gorodkin, Jan.

In: Bioinformatics, Vol. 21, No. 9, 2005, p. 1815-1824.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Havgaard, JH, Lyngsø, RB, Stormo, GD & Gorodkin, J 2005, 'Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%', Bioinformatics, vol. 21, no. 9, pp. 1815-1824. https://doi.org/10.1093/bioinformatics/bti279

APA

Havgaard, J. H., Lyngsø, R. B., Stormo, G. D., & Gorodkin, J. (2005). Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics, 21(9), 1815-1824. https://doi.org/10.1093/bioinformatics/bti279

Vancouver

Havgaard JH, Lyngsø RB, Stormo GD, Gorodkin J. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics. 2005;21(9):1815-1824. https://doi.org/10.1093/bioinformatics/bti279

Author

Havgaard, Jakob Hull ; Lyngsø, Rune B. ; Stormo, Gary D. ; Gorodkin, Jan. / Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. In: Bioinformatics. 2005 ; Vol. 21, No. 9. pp. 1815-1824.

Bibtex

@article{163afbe0a1c011ddb6ae000ea68e967b,

title = "Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%",

abstract = "Motivation: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding todya as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new FOLDALIGN implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but FOLDALIGN is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability: The program is available online at http://foldalign.kvl.dk Contact: gorodkin@bioinf.kvl.dk",

author = "Havgaard, {Jakob Hull} and Lyngs{\o}, {Rune B.} and Stormo, {Gary D.} and Jan Gorodkin",

year = "2005",

doi = "10.1093/bioinformatics/bti279",

language = "English",

volume = "21",

pages = "1815--1824",

journal = "Computer Applications in the Biosciences",

issn = "1471-2105",

publisher = "Oxford University Press",

number = "9",

}

RIS

TY - JOUR

T1 - Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

AU - Havgaard, Jakob Hull

AU - Lyngsø, Rune B.

AU - Stormo, Gary D.

AU - Gorodkin, Jan

PY - 2005

Y1 - 2005

N2 - Motivation: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding todya as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new FOLDALIGN implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but FOLDALIGN is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability: The program is available online at http://foldalign.kvl.dk Contact: gorodkin@bioinf.kvl.dk

AB - Motivation: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding todya as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new FOLDALIGN implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but FOLDALIGN is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability: The program is available online at http://foldalign.kvl.dk Contact: gorodkin@bioinf.kvl.dk

U2 - 10.1093/bioinformatics/bti279

DO - 10.1093/bioinformatics/bti279

M3 - Journal article

C2 - 15657094

VL - 21

SP - 1815

EP - 1824

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 9

ER -

ID: 8003222