The contiguous sequence generated by the polymerase during sequencing is referred to as a “polymerase read” or a Continuous Long Read (CLR).
This CLR read may include sequence from adapters and multiple copies of inserts, because it traverses the circular template many times. The CLRs are processed to remove adapter sequences and to retain only the insert sequence, called “subreads”.
All other sequences sequenced from the CLR are called “scraps”.
Multiple copies of subreads generated from the single SMRTBell can then be collapsed to a single, high-quality sequence, called the “read of insert” ROI or Circular Consensus Sequence (CCS).
The scraps file includes all the sequence not in the HQ (high quality) region, and the adapter sequences. The old bax format saved everything with indexes into the usable data. The subreads.bam contains all the usable data.
关于scrapes和subreads的bam文件的区别，只用subreads即可 The scraps file includes all the sequence not in the HQ (high quality) region, and the adapter sequences. The old bax format saved everything with indexes into the usable data. The subreads.bam contains all the usable data.
.subreads.bam A file contain usable reads without adapter;The Sequel System outputs one subreads.bam file per collection/cell, which contains unaligned base calls from high-quality regions.
.scraps.bam A file contain sequence data outside of the High Quality region, rejected subreads,excised adapter and possible barcode sequences, as well as spike-in control sequences. (The basecaller marks regions of single molecule sequence activity as high-quality.)
The goal of styler is to provide non-invasive pretty-printing of R source code while adhering to the tidyverse formatting rules. styler can be customized to format code according to other style guides too.