De Montfort University

MSc in Bioinformatics

Centre for Computational Intelligence

Student Achievements

This programme provides a good environment for our talented students to  demonstrate their great potential in this exciting field.  Some of the achievements are presented here to give you an idea what you could do if you joined us.

  • A Novel Meta Database for Relationships Between Bioinformatics Databases
    Emily Richardson developed a novel internet meta database for Bioinformatics using fuzzy sets.  This web based database allows database developers to put their databases into a family tree to link it with other existing databases.  Different from other databases, this database provides its uses freedom to see those closely related databases or those remotely related databases. (try it)

  • A paper extracted from Emily Richardson's work is to be published in the 2009 International Joint Conference on Neural Networks, Atlanta, 2009. 

    Title:

    A Novel Meta Database for Relationships Between Bioinformatics Databases

    Abstract: 

    Over the last seven years we have seen an exponential increase in the number of Bioinformatics databases available. These databases are becoming increasingly specialised and are often only known by a small community of users. This paper describes the implementation of a novel meta-database. Rather than simply displaying a list of databases that are available this project has used graphical data in the form of a family tree to show how the databases are related to one another. A general level of relatedness is described using fuzzy sets, this relationship compares every database against all the other databases to see how related they are. It facilitates an automatic construction of the tree when new databases are added.

  • A paper extracted from Chris Newby's work is to be published in the 2009 International Joint Conference on Neural Networks, Atlanta, 2009. 

    Title:
    Investigation into Effectiveness of Rough Sets in Prediction of Enzyme and Protein Structure Classes

    Abstract: 

    Among various methods in protein function prediction, rough set has recently been applied to prediction of protein structural classes. However, this was a blind application on a single but small data set of high homology, which did not consider investigation of various parameters in the rough set. The aim of this paper is therefore to study rough set in the area through comprehensive and consistent analysis and then to present a practical strategy in the rough set-based protein function prediction. To achieve this aim, three different data sets were considered: the first data set for prediction of six main enzyme classes, and other two for prediction of structural classes. Boolean reasoning, Entropy scaling and Equal frequency binning were used for discretization along with two methods for producing reducts and rules, genetic and Johnson's algorithms. It can be seen that the predictive accuracies were poor for the enzyme dataset whereas it performed better at prediction of the protein structural classes. It is also observed that the dataset with low homology produced poor accuracies than the dataset with high homology. Furthermore, various parameters and methods used in the rough set were sensitive to the problems in the area, as well as the data sets of low and high homology and different number of the features. The results appear to indicate that the equal frequency-based approach combined with genetic algorithm yields higher prediction. However, other methods such as Boolean reasoning with the genetic algorithm are also found to be promising. Further investigation will provide a practical strategy that can be used in the rough set-based protein function prediction as well as other areas of Bioinformatics.

  • A paper extracted from Sundeep Nanuwa's work has been published in the 8th IEEE International Conference on BioInformatics and BioEngineering, 2008

    Title: Investigation into the role of sequence-driven-features for
    prediction of protein structural classes
    Abstract: 

    There have been a number of techniques developed for the prediction of protein structural classes, however, they show various degrees of accuracies over different assessment procedures and, in particular, the role of sequence-drivenfeatures (SDF) not rigorously investigated. Therefore, the aim of this study is to carry out the largest comprehensive and consistent investigation on approximately 1500 protein sequence-driven-features that form 65 subsets in order to develop a robust predictive model and identify how well these feature(s) are at predicting protein structural classes. For evaluation of the features, two high quality 40% (or less) homology datasets that contain over 7000 protein sequences were extracted from proteomic databases. As a predictive technique, an optimum K-Nearest Neighbour Classifier, namely multiple-K-NN (MKNN) was developed, which not only records MKNN results, but also a predictive accuracy for each K nearest neighbourhood for K=l to 11. In order to make the analyses consistent, three different cross-validation test procedures, 10-fold, leave-one-out and independent set, were used for all data sets and methods implemented. Over 5000 individual predictive results obtained, no firm consensus found on which features are highly associated with protein structural classes. However, interestingly, the best subsets of the features are found to be traditional AAC (48.62%) for 10-fold and (50.09%) for LOO, and dipeptide composition (85.91%) for independent set. The results appear to suggest that the AAC features are one of the best two subsets over 65 different subsets. Interestingly, in particular, with pseudo-amino-acid composition (PseAAC), unlike other research results presented in the literature, this investigation finds that there is no statistical improvement obtained from the sequence-order effect aspect (lamda) of PseAAC, which averaged 39.15%. The results also suggest that most of its predictive power comes from the AAC part that averaged at 46.84%, and the overall average predictive accuracy for PseAAC is 47.86%. This information appears to suggest that this feature set, which is claimed to better capture sequence order, yields almost no improvement and can be considered a redundant and noisy feature set. It should be noted that overall outcome of this comprehensive study sheds light not only in structural class prediction, but also other proteomic studies.