vsearch example commands

vsearch is an open source alternative of usearch, which is widely used in sequence analysis.

Here I post a quick reference of several use cases for my own convenience, the full document is available here. If you use vsearch and want to cite it in your paper, or want to report a bug, please refer to its github page1.

  • Align all sequences in a database with each other and output all pairwise alignments:
vsearch --allpairs_global database.fas --alnout results.aln --acceptall
  • Check for the presence of chimeras (de novo); parents should be at least 1.5 times more abundant than chimeras. Output non-chimeric sequences in fasta format (no wrapping):
vsearch --uchime_denovo queries.fas --nonchimeras results.fas --fasta_width 0 --abskew 1.5
  • Cluster with a 97% similarity threshold, collect cluster centroids, and write cluster descriptions using a uclust-like format:
vsearch --cluster_fast queries.fas --id 0.97 --centroids centroids.fas --uc clusters.uc
  • Dereplicate the sequences contained in queries.fas, take into account the abundance information already present, write unwrapped sequences to output with the new abundance information, discard all sequences with an abundance of 1:
vsearch --derep_fulllength queries.fas --output queries_masked.fas --sizein --sizeout
--fasta_width 0 --minuniquesize 2
  • Mask simple repeats and low complexity regions in the input fasta file (masked regions are lowercased), and write the results to the output file:
vsearch --maskfasta queries.fas --output queries_masked.fas --qmask dust
  • Search queries in a reference database, with a 80%-similarity threshold, take terminal gaps into account when calculating pairwise similarities:
vsearch --usearch_global queries.fas --db references.fas --alnout results.aln --id 0.8 --iddef 1
  • Search a sequence dataset against itself (ignore self hits), get all matches with at least 60% identity, and collect results in a blast-like tab-separated format:
vsearch --usearch_global queries.fas --db queries.fas --id 0.6 --self --blast6out results.blast6 --maxaccepts 0 --maxrejects 0
  • Shuffle the input fasta file (change the order of sequences) in a repeatable fashion (fixed seed), and write unwrapped fasta sequences to the output file:
vsearch --shuffle queries.fas --output queries_shuffled.fas --seed 13 --fasta_width 0
  • Sort by decreasing abundance the sequences contained in queries.fas (using the “size=integer” information), relabel the sequences while preserving the abundance information (with −−sizeout), keep only sequences with an abundance equal to or greater than 2:
vsearch --sortbysize queries.fas --output queries_sorted.fas --relabel sampleA_ --sizeout --minsize 2

  1. https://github.com/torognes/vsearch 

Leave a Reply