The shorter the fragments, the higher the chance that the fragments are less specific and can align at more than one locus of the reference sequence. This result can be obtained first of all by limiting the number of segmental duplications. In other words it is important that the final library reflects as much as possible the singularity of the starting material.
When preparing a sequencing library it is important to get the highest complexity level as possible. The library quantification is a pivotal step and should be made using the most accurate and sensible method. The sequencing library quantification is usually made by PCR-based methods ( digital PCR – dPCR – or quantitative PCR – qPCR). The dimers can be successfully removed by a clean-up with magnetic beads. Since adapter dimers can significantly affect the sequencing yield by consuming valuable space on the flow cell, it is very important to to remove them from the library. Adapters dimers are the result of self-ligation of the adapters without a library insert sequence and are particularly abundant when the initial DNA quantity. After the adapters have been attached, there’s a sizing phase, during which all fragments of undesired size and all adapters dimers are revomed. Adapters are designed to interact with a specific sequencing platform, either the surface of the flow-cell (Illumina) or beads (Ion Torrent). A sequencing library is, by definition, a pool of DNA fragments with adapters attached. Once the DNA or RNA fragmentation is complete, the so called adapters (or adaptors) must be attached to both extremities of each fragment. This is due to the average size of a human exon, which is about 200 bp. However, in case of exome sequencing, it is recommended to use an insert size of 200-250 bp as a maximum (the term “insert” refers to the DNA fragment once the adapters have been added to its extremities). For instance, fragments of up to 1,500 bp can be used on Illumina platforms. The optimal size of the library fragments depends on the platform to be used and on the scope of the analysis. For mate-pair libraries, particularly long fragments can be obtained (6,000 to 20,000 bp). The physical and enzymatic methods are the most widely used (see for instance the sonication made with Covaris to obtain DNA fragments in the 100–5,000 bp range). Nucleic acid fragmentation (DNA, RNA or cDNA) can be done by utilizing physical methods ( acoustic shearing, better known as sonication), enzymatic methods (by using aspecific endonucleases such as the DNase I, Fragmentase or commercial enzymatic kits like the Nextera tagmentation kit – Illumina – which not only breaks the DNA, but also attaches the adapters with a transposase) or chemical methods. The fragmentation step can be done before or after the cDNA synthesis. In any RNA sequencing library there’s an additional step: the RNA conversion in cDNA.