HybPiper: Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment

Matthew G. Johnson*, Elliot M. Gardner, Yang Liu, Rafael Medina, Bernard Goffinet, A. Jonathan Shaw, Nyree J.C. Zerega, Norman J. Wickett

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

413 Scopus citations

Abstract

Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of highthroughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a userfriendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OSX and is freely available at https://github.com/mossmatters/HybPiper.

Original languageEnglish (US)
Article number1600016
JournalApplications in Plant Sciences
Volume4
Issue number7
DOIs
StatePublished - Jul 1 2016

Keywords

  • Hyb-Seq
  • bioinformatics
  • phylogenomics
  • sequence assembly

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Plant Science

Fingerprint

Dive into the research topics of 'HybPiper: Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment'. Together they form a unique fingerprint.

Cite this