RNA-seq Alignment Programs

When you are sequencing RNA for downstream analysis purposes, you often want millions of reads to ensure you have good stranded coverage. There are many programs to help estimate the count of the read alignments.

The most popular ones are:

RSEM - aligner
Salmon - pseudo-mapper/quantifier
Kallisto - pseudo-mapper/quantifier
STAR - aligner

There are a few differences. Salmon is a pseudo-aligner where they make proxies for parallel alignments rather than matching reads up one to one. This can be making kmer sets to create an index rather than go through the entire sequence. It has shown to be just as accurate and much quicker than real alignment programs.

Pseudo-aligners create indexes and stand-ins compared to aligning everything. They posit that its more important to determine the number of genes coming from each location rather than the precise location that the read maps to. As such, they are less-memory intensive than STAR and can deal with multimapping cases.

For example, there are isoform 1 version of a gene with three exons and isoform 2 versions of a gene with two exons. If there are lots of mapping to only 2 exons and none of exon 1, the program can predict that isoform 2 is the most common and redirect the multimapper program to it. It was found in benchmarking papers that Salmon and Kallisto are very accurate while STAR and HTSeq are lower in accuracy. RSEM is actively developed and also has a high degree of accuracy.