-
1.
Generative Models for Quantification of DNA Modifications.
Äijö, T, Bonneau, R, Lähdesmäki, H
Methods in molecular biology (Clifton, N.J.). 2018;:37-50
Abstract
There are multiple chemical modifications of cytosine that are important to the regulation and ultimately the functional expression of the genome. To date no single experiment can capture these separate modifications, and integrative experimental designs are needed to fully characterize cytosine methylation and chemical modification. This chapter describes a generative probabilistic model, Lux, for integrative analysis of cytosine methylation and its oxidized variants. Lux simultaneously analyzes partially orthogonal bisulfite sequencing data sets to estimate proportions of different cytosine methylation modifications and estimate multiple cytosine modifications for a single sample by integrating across experimental designs composed of multiple parallel destructive genomic measurements. Lux also considers the variation in measurements introduced by different imperfect experimental steps; the experimental variation can be quantified by using appropriate spike-in controls, allowing Lux to deconvolve the measurements and recover accurately the underlying signal.
-
2.
Novo&Stitch: accurate reconciliation of genome assemblies via optical maps.
Pan, W, Wanamaker, SI, Ah-Fong, AMV, Judelson, HS, Lonardi, S
Bioinformatics (Oxford, England). 2018;(13):i43-i51
-
-
Free full text
-
Abstract
MOTIVATION De novo genome assembly is a challenging computational problem due to the high repetitive content of eukaryotic genomes and the imperfections of sequencing technologies (i.e. sequencing errors, uneven sequencing coverage and chimeric reads). Several assembly tools are currently available, each of which has strengths and weaknesses in dealing with the trade-off between maximizing contiguity and minimizing assembly errors (e.g. mis-joins). To obtain the best possible assembly, it is common practice to generate multiple assemblies from several assemblers and/or parameter settings and try to identify the highest quality assembly. Unfortunately, often there is no assembly that both maximizes contiguity and minimizes assembly errors, so one has to compromise one for the other. RESULTS The concept of assembly reconciliation has been proposed as a way to obtain a higher quality assembly by merging or reconciling all the available assemblies. While several reconciliation methods have been introduced in the literature, we have shown in one of our recent papers that none of them can consistently produce assemblies that are better than the assemblies provided in input. Here we introduce Novo&Stitch, a novel method that takes advantage of optical maps to accurately carry out assembly reconciliation (assuming that the assembled contigs are sufficiently long to be reliably aligned to the optical maps, e.g. 50 Kbp or longer). Experimental results demonstrate that Novo&Stitch can double the contiguity (N50) of the input assemblies without introducing mis-joins or reducing genome completeness. AVAILABILITY AND IMPLEMENTATION Novo&Stitch can be obtained from https://github.com/ucrbioinfo/Novo_Stitch.
-
3.
DNA methylation detection: recent developments in bisulfite free electrochemical and optical approaches.
Bhattacharjee, R, Moriam, S, Umer, M, Nguyen, NT, Shiddiky, MJA
The Analyst. 2018;(20):4802-4818
Abstract
DNA methylation is one of the significant epigenetic modifications involved in mammalian development as well as in the initiation and progression of various diseases like cancer. Over the past few decades, an enormous amount of research has been carried out for the quantification of DNA methylation in the mammalian genome. Earlier, most of these methodologies used bisulfite treatment. However, the low conversion, false reading, longer assay time and complex chemical reaction are the common limitations of this method that hinder their application in routine clinical screening. Thus, as an alternative to bisulfite conversion-based DNA methylation detection, numerous bisulfite-free methods have been proposed. In this regard, electrochemical biosensors have gained much attention in recent years for being highly sensitive yet cost-effective, portable, and simple to operate. On the other hand, biosensors with optical readouts enable direct real time detection of biological molecules and are easily adaptable to multiplexing. Incorporation of electrochemical and optical readouts into bisulfite free DNA methylation analysis is paving the way for the translation of this important biomarker into standard patient care. In this review, we provide a critical overview of recent advances in the development of electrochemical and optical readout based bisulfite free DNA methylation assays.
-
4.
A Method for Targeted 16S Sequencing of Human Milk Samples.
Tobin, NH, Woodward, C, Zabih, S, Lee, DJ, Li, F, Aldrovandi, GM
Journal of visualized experiments : JoVE. 2018;(133)
-
-
Free full text
-
Abstract
Studies of microbial communities have become widespread with the development of relatively inexpensive, rapid, and high throughput sequencing. However, as with all these technologies, reproducible results depend on a laboratory workflow that incorporates appropriate precautions and controls. This is particularly important with low-biomass samples where contaminating bacterial DNA can generate misleading results. This article details a semi-automated workflow to identify microbes from human breast milk samples using targeted sequencing of the 16S ribosomal RNA (rRNA) V4 region on a low- to mid-throughput scale. The protocol describes sample preparation from whole milk including: sample lysis, nucleic acid extraction, amplification of the V4 region of the 16S rRNA gene, and library preparation with quality control measures. Importantly, the protocol and discussion consider issues that are salient to the preparation and analysis of low-biomass samples including appropriate positive and negative controls, PCR inhibitor removal, sample contamination by environmental, reagent, or experimental sources, and experimental best practices designed to ensure reproducibility. While the protocol as described is specific to human milk samples, it is adaptable to numerous low- and high-biomass sample types, including samples collected on swabs, frozen neat, or stabilized in a preservation buffer.
-
5.
Long-term combined application of manure and chemical fertilizer sustained higher nutrient status and rhizospheric bacterial diversity in reddish paddy soil of Central South China.
Cui, X, Zhang, Y, Gao, J, Peng, F, Gao, P
Scientific reports. 2018;(1):16554
Abstract
Bacteria, as the key component of soil ecosystems, participate in nutrient cycling and organic matter decomposition. However, how fertilization regime affects the rhizospheric bacterial community of reddish paddy soil remains unclear. Here, a long-term fertilization experiment initiated in 1982 was employed to explore the impacts of different fertilization regimes on physicochemical properties and bacterial communities of reddish paddy rhizospheric soil in Central South China by sequencing the 16S rRNA gene. The results showed that long-term fertilization improved the soil nutrient status and shaped the distinct rhizospheric bacterial communities. Particularly, chemical NPK fertilizers application significantly declined the richness of the bacterial community by 7.32%, whereas the application of manure alone or combined with chemical NPK fertilizers significantly increased the biodiversity of the bacterial community by 1.45%, 1.87% compared with no fertilization, respectively. Moreover, LEfSe indicated that application of chemical NPK fertilizers significantly enhanced the abundances of Verrucomicrobia and Nitrospiraceae, while manure significantly increased the abundances of Deltaproteobacteria and Myxococcales, but the most abundant Actinobacteria and Planctomycetes were detected in the treatment that combined application of manure and chemical NPK fertilizers. Furthermore, canonical correspondence analysis (CCA) and the Mantel test clarified that exchangeable Mg2+ (E-Mg2+), soil organic carbon (SOC) and alkali-hydrolyzable nitrogen (AN) are the key driving factors for shaping bacterial communities in the rhizosphere. Our results suggested that long-term balanced using of manure and chemical fertilizers not only increased organic material pools and nutrient availability but also enhanced the biodiversity of the rhizospheric bacterial community and the abundance of Actinobacteria, which contribute to the sustainable development of agro-ecosystems.
-
6.
ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.
Coombe, L, Zhang, J, Vandervalk, BP, Chu, J, Jackman, SD, Birol, I, Warren, RL
BMC bioinformatics. 2018;(1):234
Abstract
BACKGROUND The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. RESULTS Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). CONCLUSIONS ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.
-
7.
iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC.
Khan, YD, Rasool, N, Hussain, W, Khan, SA, Chou, KC
Molecular biology reports. 2018;(6):2501-2509
Abstract
Protein phosphorylation is one of the most fundamental types of post-translational modifications and it plays a vital role in various cellular processes of eukaryotes. Among three types of phosphorylation i.e. serine, threonine and tyrosine phosphorylation, tyrosine phosphorylation is one of the most frequent and it is important for mediation of signal transduction in eukaryotic cells. Site-directed mutagenesis and mass spectrometry help in the experimental determination of cellular signalling networks, however, these techniques are costly, time taking and labour associated. Thus, efficient and accurate prediction of these sites through computational approaches can be beneficial to reduce cost and time. Here, we present a more accurate and efficient sequence-based computational method for prediction of phosphotyrosine (PhosY) sites by incorporation of statistical moments into PseAAC. The study is carried out based on Chou's 5-step rule, and various position-composition relative features are used to train a neural network for the prediction purpose. Validation of results through Jackknife testing is performed to validate the results of the proposed prediction method. Overall accuracy validated through Jackknife testing was calculated 93.9%. These results suggest that the proposed prediction model can play a fundamental role in the prediction of PhosY sites in an accurate and efficient way.
-
8.
BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data.
Soe, S, Park, Y, Chae, H
BMC bioinformatics. 2018;(1):472
Abstract
BACKGROUND Bisulfite sequencing is one of the major high-resolution DNA methylation measurement method. Due to the selective nucleotide conversion on unmethylated cytosines after treatment with sodium bisulfite, processing bisulfite-treated sequencing reads requires additional steps which need high computational demands. However, a dearth of efficient aligner that is designed for bisulfite-treated sequencing becomes a bottleneck of large-scale DNA methylome analyses. RESULTS In this study, we present a highly scalable, efficient, and load-balanced bisulfite aligner, BiSpark, which is designed for processing large volumes of bisulfite sequencing data. We implemented the BiSpark algorithm over the Apache Spark, a memory optimized distributed data processing framework, to achieve the maximum data parallel efficiency. The BiSpark algorithm is designed to support redistribution of imbalanced data to minimize delays on large-scale distributed environment. CONCLUSIONS Experimental results on methylome datasets show that BiSpark significantly outperforms other state-of-the-art bisulfite sequencing aligners in terms of alignment speed and scalability with respect to dataset size and a number of computing nodes while providing highly consistent and comparable mapping results. AVAILABILITY The implementation of BiSpark software package and source code is available at https://github.com/bhi-kimlab/BiSpark/ .
-
9.
Microdroplet PCR for Highly Multiplexed Targeted Bisulfite Sequencing.
Komori, HK, LaMere, SA, Hart, T, Head, SR, Torkamani, A, Salomon, DR
Methods in molecular biology (Clifton, N.J.). 2018;:333-348
Abstract
Many methods exist for examining CpG DNA methylation. However, many of these are qualitative, laborious to apply to a large number of genes simultaneously, or are not easy to target to specific regions of interest. Microdroplet PCR-based bisulfite sequencing allows for quantitative single base resolution analysis of investigator selected regions of interest. Following bisulfite conversion of genomic DNA, targeted microdroplet PCR is conducted with custom primer libraries. Samples are then fragmented, concatenated, and sequenced by high-throughput sequencing. The most recent technology allows for this method to be conducted with as little as 250 ng of bisulfite-converted DNA. The primary advantage of this method is the ability to hand-select the targeted regions covered by up to 10,000 amplicons of 500-600 bp. Moreover, the nature of microdroplet PCR virtually eliminates PCR bias and allows for the amplification of all targets simultaneously in a single tube.
-
10.
Joker de Bruijn: Covering k-Mers Using Joker Characters.
Orenstein, Y, Yu, YW, Berger, B
Journal of computational biology : a journal of computational molecular cell biology. 2018;(11):1171-1178
-
-
Free full text
-
Abstract
Sequence libraries that cover all k-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all k-mers is a de Bruijn sequence of length [Formula: see text]. Researchers would like to increase k to measure interactions at greater detail, but face a challenging problem: the number of k-mers grows exponentially in k, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of k-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all k-mers for different values of k and k-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.