Ruzzo-Tompa Algorithm

Jun 01, 2019
biology bioinformatics
1 min read

The Ruzzo-Tompa algorithm is a method for finding distinct subsequences in a sequence of numbers such that the sum of the subsequences reaches a maximum. The difficult part here is ensuring that the subsequences are distinct and non-overlapping.

Consider the following sequence: \((1, 2, 3, -5, 1, 2, 3)\). While one might be tempted to extract the two subsequences \((1, 2, 3)\) and \((1, 2, 3)\), skipping over the -5, the two subsequences are not distinct. The subsequences \((1, 2, 3), (1), (2, 3)\), by contrast, achieve the same score and are distinct. As the sequence gets very long this task gets challenging very quickly. The Ruzzo-Tompa algorithm describes a strategy which makes light work of the task, provided you are a computer.

The algorithm has applications in bioinformatics. It is particularly useful in the study of DNA. Those working in bioinformatics are often interested in finding similar (long) subsequences in different samples of DNA. Finding similar DNA in different organisms is one thing biologists are interested in, often for the same reasons that biologists are interested in learning about in similar physical characteristics in different species.

This post is part of a series. The most recent post in the series is “FASTA Format”. Learn when new posts appear by subscribing (RSS).