0
selected
-
1.
Methodological considerations for the identification of choline and carnitine-degrading bacteria in the gut.
Jameson, E, Quareshy, M, Chen, Y
Methods (San Diego, Calif.). 2018;:42-48
Abstract
The bacterial formation of trimethylamine (TMA) has been linked to cardiovascular disease. This review focuses on the methods employed to investigate the identity of the bacteria responsible for the formation of TMA from dietary choline and carnitine in the human gut. Recent studies have revealed the metabolic pathways responsible for bacterial TMA production, primarily the anaerobic glycyl radical-containing, choline-TMA lyase, CutC and the aerobic carnitine monooxygenase, CntA. Identification of these enzymes has enabled bioinformatics approaches to screen both human-associated bacterial isolate genomes and whole gut metagenomes to determine which bacteria are responsible for TMA formation in the human gut. We centre on several key methodological aspects for identifying the TMA-producing bacteria and report how these pathways can be identified in human gut microbiota through bioinformatics analysis of available bacterial genomes and gut metagenomes.
-
2.
ProBAPred: Inferring protein-protein binding affinity by incorporating protein sequence and structural features.
Lu, B, Li, C, Chen, Q, Song, J
Journal of bioinformatics and computational biology. 2018;(4):1850011
Abstract
Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657 kcal/mol) and the second highest correlation coefficient ( R-value=0.467 ), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.
-
3.
A systematic exploration of [Formula: see text] cutoff ranges in machine learning models for protein mutation stability prediction.
Olney, R, Tuor, A, Jagodzinski, F, Hutchinson, B
Journal of bioinformatics and computational biology. 2018;(5):1840022
Abstract
Discerning how a mutation affects the stability of a protein is central to the study of a wide range of diseases. Mutagenesis experiments on physical proteins provide precise insights about the effects of amino acid substitutions, but such studies are time and cost prohibitive. Computational approaches for informing experimentalists where to allocate wet-lab resources are available, including a variety of machine learning models. Assessing the accuracy of machine learning models for predicting the effects of mutations is dependent on experiments for amino acid substitutions performed in vitro. When similar experiments on physical proteins have been performed by multiple laboratories, the use of the data near the juncture of stabilizing and destabilizing mutations is questionable. In this work, we explore a systematic and principled alternative to discarding experimental data close to the juncture of stabilizing and destabilizing mutations. We model the inconclusive range of experimental [Formula: see text] values via 3- and 5-way classifiers, and systematically explore potential boundaries for the range of inconclusive experimental values. We demonstrate the effectiveness of potential boundaries through confusion matrices and heat map visualizations. We explore two novel metrics for assessing viable cutoff ranges, and find that under these metrics, a lower cutoff near [Formula: see text] and an upper cutoff near [Formula: see text] are optimal across multiple machine learning models.
-
4.
Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou's general pseudo amino acid composition.
Ju, Z, Wang, SY
Gene. 2018;:78-83
Abstract
As one of the most important and common protein post-translational modifications, citrullination plays a key role in regulating various biological processes and is associated with several human diseases. The accurate identification of citrullination sites is crucial for elucidating the underlying molecular mechanisms of citrullination and designing drugs for related human diseases. In this study, a novel bioinformatics tool named CKSAAP_CitrSite is developed for the prediction of citrullination sites. With the assistance of support vector machine algorithm, the highlight of CKSAAP_CitrSite is to adopt the composition of k-spaced amino acid pairs surrounding a query site as input. As illustrated by 10-fold cross-validation, CKSAAP_CitrSite achieves a satisfactory performance with a Sensitivity of 77.59%, a Specificity of 95.26%, an Accuracy of 89.37% and a Matthew's correlation coefficient of 0.7566, which is much better than those of the existing prediction method. Feature analysis shows that the N-terminal space containing pairs may play an important role in the prediction of citrullination sites, and the arginines close to N-terminus tend to be citrullinated. The conclusions derived from this study could offer useful information for elucidating the molecular mechanisms of citrullination and related experimental validations. A user-friendly web-server for CKSAAP_CitrSite is available at 123.206.31.171/CKSAAP_CitrSite/.
-
5.
A Comprehensive Computational Analysis of Mycobacterium Genomes Pinpoints the Genes Co-occurring with YczE, a Membrane Protein Coding Gene Under the Putative Control of a MocR, and Predicts its Function.
Milano, T, Angelaccio, S, Tramonti, A, di Salvo, ML, Nogues, I, Contestabile, R, Pascarella, S
Interdisciplinary sciences, computational life sciences. 2018;(1):111-125
Abstract
Bacterial proteins belonging to the YczE family are predicted to be membrane proteins of yet unknown function. In many bacterial species, the yczE gene coding for the YczE protein is divergently transcribed with respect to an adjacent transcriptional regulator of the MocR family. According to in silico predictions, proteins named YczR are supposed to regulate the expression of yczE genes. These regulators linked to the yczE genes are predicted to constitute a subfamily within the MocR family. To put forward hypotheses amenable to experimental testing about the possible function of the YczE proteins, a phylogenetic profile strategy was applied. This strategy consists in searching for those genes that, within a set of genomes, co-occur exclusively with a certain gene of interest. Co-occurrence can be suggestive of a functional link. A set of 30 mycobacterial complete proteomes were collected. Of these, only 16 contained YczE proteins. Interestingly, in all cases each yczE gene was divergently transcribed with respect to a yczR gene. Two orthology clustering procedures were applied to find proteins co-occurring exclusively with the YczE proteins. The reported results suggest that YczE may be involved in the membrane translocation and metabolism of sulfur-containing compounds mostly in rapidly growing, low pathogenicity mycobacterial species. These observations may hint at potential targets for therapies to treat the emerging opportunistic infections provoked by the widespread environmental mycobacterial species and may contribute to the delineation of the genomic and physiological differences between the pathogenic and non-pathogenic mycobacterial species.
-
6.
A computational method for prediction of xylanase enzymes activity in strains of Bacillus subtilis based on pseudo amino acid composition features.
Ariaeenejad, S, Mousivand, M, Moradi Dezfouli, P, Hashemi, M, Kavousi, K, Hosseini Salekdeh, G
PloS one. 2018;(10):e0205796
Abstract
Xylanases are hydrolytic enzymes which based on physicochemical properties, structure, mode of action and substrate specificities are classified into various glycoside hydrolase (GH) families. The purpose of this study is to show that the activity of the members of the xylanase family in the specified pH and temperature conditions can be computationally predicted. The proposed computational regression model was trained and tested with the Pseudo Amino Acid Composition (PseAAC) features extracted solely from the amino acid sequences of enzymes. The xylanases with experimentally determined activities were used as the training dataset to adjust the model parameters. To develop the model, 41 strains of Bacillus subtilis isolated from field soil were screened. From them, 28 strains with the highest halo diameter were selected for further studies. The performance of the model for prediction of xylanase activity was evaluated in three different temperature and pH conditions using stratified cross-validation and jackknife methods. The trained model can be used for determining the activity of newly found xylanases in the specified condition. Such computational models help to scale down the experimental costs and save time by identifying enzymes with appropriate activity for scientific and industrial usage. Our methodology for activity prediction of xylanase enzymes can be potentially applied to the members of the other enzyme families. The availability of sufficient experimental data in specified pH and temperature conditions is a prerequisite for training the learning model and to achieve high accuracy.
-
7.
LMMO: A Large Margin Approach for Refining Regulatory Motifs.
Zhu, L, Zhang, HB, Huang, DS
IEEE/ACM transactions on computational biology and bioinformatics. 2018;(3):913-925
Abstract
Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, they usually have to sacrifice accuracy and may fail to fully leverage the potential of large datasets. Recently, it has been demonstrated that the motifs identified by DMDs can be significantly improved by maximizing the receiver-operating characteristic curve (AUC) metric, which has been widely used in the literature to rank the performance of elicited motifs. However, existing approaches for motif refinement choose to directly maximize the non-convex and discontinuous AUC itself, which is known to be difficult and may lead to suboptimal solutions. In this paper, we propose Large Margin Motif Optimizer (LMMO), a large-margin-type algorithm for refining regulatory motifs. By relaxing the AUC cost function with the surrogate convex hinge loss, we show that the resultant learning problem can be cast as an instance of difference-of-convex (DC) programs, and solve it iteratively using constrained concave-convex procedure (CCCP). To further save computational time, we combine LMMO with existing techniques for improving the scalability of large-margin-type algorithms, such as cutting plane method. Experimental evaluations on synthetic and real data illustrate the performance of the proposed approach. The code of LMMO is freely available at: https://github.com/ekffar/LMMO.
-
8.
ComplexContact: a web server for inter-protein contact prediction using deep learning.
Zeng, H, Wang, S, Zhou, T, Zhao, F, Li, X, Wu, Q, Xu, J
Nucleic acids research. 2018;(W1):W432-W437
-
-
Free full text
-
Abstract
ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
-
9.
PATSIM: Prediction and analysis of protein sequences using hybrid Knuth-Morris Pratt (KMP) and Boyer-Moore (BM) algorithm.
Manikandan, P, Ramyachitra, D
Gene. 2018;:50-59
Abstract
In phylogenomic profiling, the genomic context based methods are based on the observation that two or more proteins having the same pattern of presence or absence in many diverse genomes most likely have a functional link. In this research work, a tool (PATSIM) has been developed to predict the protein patterns based on the SOPM tool. In this tool, the secondary structure for CATH database protein sequences, predicted by the SOPM (Self Optimized Prediction Method) server is passed as input to fulfill objectives such as, (i) Predict the Amino Acid Pattern using the proposed Hybrid KMP and BM algorithm, (ii) Predict the physiochemical properties such as Hydrophobic Non-Polar ALKYL Amino Acid groups, Hydrophobic Non-Polar AROMATIC Amino Acid groups, Hydrophilic Polar Neutral Amino Acid groups, Hydrophilic Polar Acidic Amino Acid groups and Hydrophilic Polar Basic Amino Acid groups of protein sequence, (iii) Predict the secondary structure of protein where the structure of protein sequence is unknown, and (iv) Similarity analysis of protein sequence (structure unknown) with the CATH database. From the results, it is inferred that this tool effectively predicts the similarity between the sequences and also identifies the protein patterns for four secondary structural classes, namely Alpha Helix (h), Beta Sheet (e), Turn (t) and Coil (c). Based on the experimental results, it is inferred that this tool identifies the physiochemical properties of the protein sequence in an effective manner. The source code and its documentation for the PATSIM tool is freely available in the GitHub public repository (https://github.com/manimkn89/Protein-Sequence-Analysis).
-
10.
Mutation goals in the vitamin D receptor predicted by computational methods.
Sicinska, W, Gront, D, Sicinski, K
The Journal of steroid biochemistry and molecular biology. 2018;:210-220
Abstract
The mechanism through which nuclear receptors respond differentially to structurally distinct agonists is a poorly understood process. We present a computational method that identifies nuclear receptor amino acids that are likely involved in biological responses triggered by ligand binding. The method involves tracing how structural changes spread from the ligand binding pocket to the sites on the receptor surface, which makes it a good tool for studying allosteric effects. We employ the method to the vitamin D receptor and verify that the identified amino acids are biologically relevant using a broad range of experimental data and a genome browser. We infer that surface vitamin D receptor residues K141, R252, I260, T280, T287 and L417 are likely involved in cell differentiation and antiproliferation, whereas P122, D149, K321, E353 and Q385 are linked to carcinogenesis.