-
1.
pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC.
Cheng, X, Xiao, X, Chou, KC
Genomics. 2018;(1):50-58
Abstract
Many efforts have been made in predicting the subcellular localization of eukaryotic proteins, but most of the existing methods have the following two limitations: (1) their coverage scope is less than ten locations and hence many organelles in an eukaryotic cell cannot be covered, and (2) they can only be used to deal with single-label systems in which each of the constituent proteins has one and only one location. Actually, proteins with multiple locations are particularly interesting since they may have some exceptional functions very important for in-depth understanding the biological process in a cell and for selecting drug target as well. Although several predictors (such as "Euk-mPLoc", "Euk-PLoc 2.0" and "iLoc-Euk") can cover up to 22 different location sites, and they also have the function to treat multi-labeled proteins, further efforts are needed to improve their prediction quality, particularly in enhancing the absolute true rate and in reducing the absolute false rate. Here we propose a new predictor called "pLoc-mEuk" by extracting the key GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a high-quality and stringent benchmark dataset have indicated that the proposed pLoc-mEuk predictor is remarkably superior to iLoc-Euk, the best of the aforementioned three predictors. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mEuk/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
-
2.
In silico identification of rescue sites by double force scanning.
Tiberti, M, Pandini, A, Fraternali, F, Fornili, A
Bioinformatics (Oxford, England). 2018;(2):207-214
-
-
Free full text
-
Abstract
MOTIVATION A deleterious amino acid change in a protein can be compensated by a second-site rescue mutation. These compensatory mechanisms can be mimicked by drugs. In particular, the location of rescue mutations can be used to identify protein regions that can be targeted by small molecules to reactivate a damaged mutant. RESULTS We present the first general computational method to detect rescue sites. By mimicking the effect of mutations through the application of forces, the double force scanning (DFS) method identifies the second-site residues that make the protein structure most resilient to the effect of pathogenic mutations. We tested DFS predictions against two datasets containing experimentally validated and putative evolutionary-related rescue sites. A remarkably good agreement was found between predictions and experimental data. Indeed, almost half of the rescue sites in p53 was correctly predicted by DFS, with 65% of remaining sites in contact with DFS predictions. Similar results were found for other proteins in the evolutionary dataset. AVAILABILITY AND IMPLEMENTATION The DFS code is available under GPL at https://fornililab.github.io/dfs/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
-
3.
Recognition of Protein Pupylation Sites by Adopting Resampling Approach.
Li, T, Chen, Y, Li, T, Jia, C
Molecules (Basel, Switzerland). 2018;(12)
Abstract
With the in-depth study of posttranslational modification sites, protein ubiquitination has become the key problem to study the molecular mechanism of posttranslational modification. Pupylation is a widely used process in which a prokaryotic ubiquitin-like protein (Pup) is attached to a substrate through a series of biochemical reactions. However, the experimental methods of identifying pupylation sites is often time-consuming and laborious. This study aims to propose an improved approach for predicting pupylation sites. Firstly, the Pearson correlation coefficient was used to reflect the correlation among different amino acid pairs calculated by the frequency of each amino acid. Then according to a descending ranked order, the multiple types of features were filtered separately by values of Pearson correlation coefficient. Thirdly, to get a qualified balanced dataset, the K-means principal component analysis (KPCA) oversampling technique was employed to synthesize new positive samples and Fuzzy undersampling method was employed to reduce the number of negative samples. Finally, the performance of our method was verified by means of jackknife and a 10-fold cross-validation test. The average results of 10-fold cross-validation showed that the sensitivity (Sn) was 90.53%, specificity (Sp) was 99.8%, accuracy (Acc) was 95.09%, and Matthews Correlation Coefficient (MCC) was 0.91. Moreover, an independent test dataset was used to further measure its performance, and the prediction results achieved the Acc of 83.75%, MCC of 0.49, which was superior to previous predictors. The better performance and stability of our proposed method showed it is an effective way to predict pupylation sites.
-
4.
Sequence fingerprints distinguish erroneous from correct predictions of intrinsically disordered protein regions.
Saravanan, KM, Dunker, AK, Krishnaswamy, S
Journal of biomolecular structure & dynamics. 2018;(16):4338-4351
Abstract
More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%-~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.
-
5.
Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins.
Necci, M, Piovesan, D, Tosatto, SCE
Database : the journal of biological databases and curation. 2018
Abstract
Intrinsic disorder (ID) in proteins is involved in crucial interactions in the living cell. As the importance of ID is increasingly recognized, so are detailed analyses aimed at its identification and characterization. An open question remains the existence of ID `flavors' representing different sub-phenomena. Several databases collect manually curated examples of experimentally validated ID, focusing on apparently different aspects of this phenomenon. The recent update of MobiDB presented the opportunity to carry out an in-depth comparison of the content of these validated ID collections, namely DIBS, DisProt, IDEAL, MFIB, FuzDB, ELM and UniProt. In order to assess what is specific to different ID flavors, we analyzed relevant sequence-based features, such as amino acid composition, length, taxa and gene ontology terms, highlighting differences and similarities among datasets. Despite that, the majority of the considered features are not statistically different across databases, with the exception of ELM. FuzDB also shares half of its entries with DisProt. In general, different ID databases describe similar phenomena. DisProt, which is the largest database, better represents the entire spectrum of different disorder flavors and the corresponding sequence diversity.
-
6.
Mutants of β2-glycoprotein I: Their features and potent applications.
Shen, L, Azmi, NU, Tan, XW, Yasuda, S, Wahyuningsih, AT, Inagaki, J, Kobayashi, K, Ando, E, Sasaki, T, Matsuura, E
Best practice & research. Clinical rheumatology. 2018;(4):572-590
Abstract
β2-Glycoprotein I (β2GPI) is a highly-glycosylated plasma protein composed of five homologous domains which regulates coagulation, fibrinolysis, and/or angiogenesis by interacting to negatively charged hydrophobic molecules and/or with plasminogen and its metabolites. The present study focused on structural and functional characterization of β2GPI's domain I (DI) and V (DV). Through N-terminal amino acid sequencing, a novel plasmin-cleaved site at K287C288 was identified in DV. We further modified the intact DV by altering two amino acids at specific proteolytic cleavage sites to generate three stable DV mutants: DV(PP), (PE), and (AA). Results of both SDS-PAGE and MALDI-TOF-MS showed that all three DV mutants were more stable than the intact DV, and DV(PE) was predominantly resistant to proteolysis. Competitive ELISA assessed affinities of intact β2GPI and those mutants to cardiolipin. In culture system, all DV and DI mutants potently inhibited HUVEC's proliferation by 18-30% as compared to control. Only DI and nicked β2GPI showed significant inhibition in HUVEC's tube formation. Moreover, DV(PE)-coated affinity columns demonstrated its binding property towards anionic lipids and could substantially isolate anionic DOPS from zwitterionic DOPC as a purification model. In summary, the proteolytic resistant and unhindered phospholipid (PL) binding properties of DV(PE) have made it an appealing element for subsequent prospective studies. Future in-depth characterization and optimized applications of cleavage-resistant DV(PE) would complement its full capacity as a novel clinical modality in the field of vascular imaging and/or lipidomics studies.
-
7.
PDBe: towards reusable data delivery infrastructure at protein data bank in Europe.
Mir, S, Alhroub, Y, Anyango, S, Armstrong, DR, Berrisford, JM, Clark, AR, Conroy, MJ, Dana, JM, Deshpande, M, Gupta, D, et al
Nucleic acids research. 2018;(D1):D486-D492
-
-
Free full text
-
Abstract
The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.
-
8.
GCPred: a web tool for guanylyl cyclase functional centre prediction from amino acid sequence.
Xu, N, Fu, D, Li, S, Wang, Y, Wong, A
Bioinformatics (Oxford, England). 2018;(12):2134-2135
Abstract
SUMMARY GCPred is a webserver for the prediction of guanylyl cyclase (GC) functional centres from amino acid sequence. GCs are enzymes that generate the signalling molecule cyclic guanosine 3', 5'-monophosphate from guanosine-5'-triphosphate. A novel class of GC centres (GCCs) has been identified in complex plant proteins. Using currently available experimental data, GCPred is created to automate and facilitate the identification of similar GCCs. The server features GCC values that consider in its calculation, the physicochemical properties of amino acids constituting the GCC and the conserved amino acids within the centre. From user input amino acid sequence, the server returns a table of GCC values and graphs depicting deviations from mean values. The utility of this server is demonstrated using plant proteins and the human interleukin-1 receptor-associated kinase family of proteins as example. AVAILABILITY AND IMPLEMENTATION The GCPred server is available at http://gcpred.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
-
9.
Feature extraction method for proteins based on Markov tripeptide by compressive sensing.
Gao, CF, Wu, XY
BMC bioinformatics. 2018;(1):229
Abstract
BACKGROUND In order to capture the vital structural information of the original protein, the symbol sequence was transformed into the Markov frequency matrix according to the consecutive three residues throughout the chain. A three-dimensional sparse matrix sized 20 × 20 × 20 was obtained and expanded to one-dimensional vector. Then, an appropriate measurement matrix was selected for the vector to obtain a compressed feature set by random projection. Consequently, the new compressive sensing feature extraction technology was proposed. RESULTS Several indexes were analyzed on the cell membrane, cytoplasm, and nucleus dataset to detect the discrimination of the features. In comparison with the traditional methods of scale wavelet energy and amino acid components, the experimental results suggested the advantage and accuracy of the features by this new method. CONCLUSIONS The new features extracted from this model could preserve the maximum information contained in the sequence and reflect the essential properties of the protein. Thus, it is an adequate and potential method in collecting and processing the protein sequence from a large sample size and high dimension.
-
10.
predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue.
Hasan, MA, Li, J, Ahmad, S, Molla, MK
Analytical biochemistry. 2017;:107-113
Abstract
The carbonylation is found as an irreversible post-translational modification and considered a biomarker of oxidative stress. It plays major role not only in orchestrating various biological processes but also associated with some diseases such as Alzheimer's disease, diabetes, and Parkinson's disease. However, since the experimental technologies are costly and time-consuming to detect the carbonylation sites in proteins, an accurate computational method for predicting carbonylation sites is an urgent issue which can be useful for drug development. In this study, a novel computational tool termed predCar-Site has been developed to predict protein carbonylation sites by (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing the effect of skewed training dataset by Different Error Costs method, and (3) constructing a predictor using support vector machine as classifier. This predCar-Site predictor achieves an average AUC (area under curve) score of 0.9959, 0.9999, 1, and 0.9997 in predicting the carbonylation sites of K, P, R, and T, respectively. All of the experimental results along with AUC are found from the average of 5 complete runs of the 10-fold cross-validation and those results indicate significantly better performance than existing predictors. A user-friendly web server of predCar-Site is available at http://research.ru.ac.bd/predCar-Site/.