Center for Computational Natural Sciences and Bioinformatics

Recent advances in computational power and computational methods have led to the crystallization of the paradigm of 'computational thinking' which takes applications of computer science far beyond mere programming and data management. This paradigm provides newer methods for understanding the functioning of complex physical, chemical and biological systems.

Research in the area of computational natural sciences is thus an exciting and rapidly emerging career option for science graduates who have an aptitude for mathematics and computations. The domains of computational sciences and bioinformatics provide research challenges for both computer scientists as well as for researchers trained in basic sciences.

With its academic activities organized around research centers, IIIT Hyderabad integrates research and education both at UG as well as PG levels. The Center for Computational Natural Sciences and Bioinformatics spearheads activities in the areas of bioinformatics and computational natural sciences and forms an integral component of IIIT research agenda. The academic programs in Computational Natural Sciences (more) and Bioinformatics (more) respectively are research (Click here for activity details) oriented and are woven around the following focus areas:

Bioinformatics and Systems Biology more

Chemical Dynamics more

Electronic Structure and Properties Calculations more

Structural Bioinformatics more

Other areas and developmental activities




Bioinformatics and Systems Biology

1. Development of Comprehensive Gene Database - An important pattern recognition problem in biological sequences is gene prediction . the region that codes for proteins. What are the important conserved patterns or motifs in exonic and intronic regions of eukaryotic genes, splice site recognition, promoters & regulatory sequences found in the vicinity of genic regions, etc. are some of the important questions in gene prediction. Developing a specialized database of genes would greatly facilitate in this analysis and also help in annotating important functional regions in and around genic sequences. We are developing a Comprehensive Gene Database (CGD) of mammals be integrating information from various NCBI resources.

Related Publication: 2. An Integrated Tool for SNP Function Analysis - Single nucleotide polymorphisms (SNPs) are commonly used for association studies to find genes responsible for complex genetic diseases. The complex diseases may involve many genes and hundreds of alleles but only a small portion of them are functional polymorphisms that contribute to disease phenotypes. Assessment of the risk requires access to a variety of heterogeneous biological databases and analytical tools. We are developing a web server that facilitates the functional analysis of SNPs.

3. Identifying Genomic Islands and Pathogenicity Islands - In recent years many different genomic islands have been discovered in a variety of pathogenic as well as non-pathogenic bacteria. Because they promote genetic variability, genomic islands play an important role in microbial evolution. Pathogenicity islands (PAIs) are a subset of GIs and represent distinct genetic elements encoding virulence factors of pathogenic bacteria. A gene in a genome is defined as putative alien (pA) if its codon usage difference from the average gene exceeds a high threshold and codon usage differences from ribosomal protein genes, chaperone genes and protein synthesis processing factors are also high. pA gene clusters in bacterial genomes are relevant for detecting genomic islands (GIs), including pathogenicity islands (PAIs). We have developed a tool using four approaches to identify GIs and PAIs: G+C genome variation (the standard method); genomic signature divergences (dinucleotide bias); extremes of codon bias; and anomalies of amino acid usage.

4. Comparative Genome Analysis - With the availability of large number of genomes, comparative genomics has evolved as one of the most important areas in bioinformatics for comparing the complete genetic material of one organism with another to gain a better understanding of evolution of the species and to determine the functional role of genes and non-coding regions of the genome. In this project for developing a functional biosensor for detection of Mycobacterium tuberculi, we shall use in-silico approach to shortlist TB specific antigens by comparative genomics approach and ab initio modeling. The M. tuberculosis complex which comprises of six members, M. tuberculosis, M. africanum, M. microti, M. bovis, M. canetti and BCG (bacille Calmette-Guerin). We would like to compare complete genomes of the six members of the Mycobacterium tuberculi to identify regions unique to a species, deleted or truncated genes in the regions of deletion (RD) and identify virulent genes in M. tuberculosis. Several RD regions encode potential genes that are virulent factors typical to microbial pathogens and form useful candidates for the development of powerful tools for the identification of M. tuberculosis members.

5. Identifying Repeats in Protein Sequences - The study of protein sequence periodicity is one of the numerous approaches to protein structure investigations. The purpose of these investigations is to find structural peculiarities of amino acid sequences and their relations with a spatial organization of proteins. The internal repeats in proteins can be of various types, tandem copies of a motif, periodic repetition of amino acids, more than one copy of motifs spread over the protein. Repeats that occur tandemly in sequence are found to form integrated assemblies when viewed as three-dimensional structures. Such repeats, essentially defined by their multiplicity, differ from both domains and motifs since these can occur singly. The importance of such repeats in understanding biological function resides not only in their high frequency among known sequences, but also in their abilities to confer multiple binding and structural roles on proteins, e.g., zinc finger domain, a constituent of transcription factors involved in DNA binding, where the composition and copy number of individual tandem repeats confers selectivity and activity of DNA binding. This functional versatility is apparent not only among different repeat types, but also for similar repeats from the same family. Our understanding of repeats, with respect to their structures, functions, and evolution, therefore represents a considerable challenge. Considerable sequence divergence as well as the short lengths of sequence repeats makes repeat detection a particularly difficult task.

Related Publication: 6. Model Protein Structures Using Graph Theory Approach - In this project we propose to use graph theory methods to understand protein structure, folding and function. Graph theory is a branch of discrete mathematics that is used in the study of various real-world networks and their properties. Chemical molecules being a set of atoms or group of atoms (vertices) connected by covalent bonds (deges) have also been extensively investigated by graph theory. The structure of biopolymers like proteins is governed to a large extent by non-covalent interactions, and graph theory is being used to gain insights into the structures of proteins. Analysis of the topological details of proteins with known structures, such as clustering of specific types of amino acids important for structure, folding and function, is of great value as large number of protein structures are now available. Identification of amino acid clusters and hubs in such protein structure graphs provide interesting insights into the structure, stability, folding and function of proteins.

7. Dynamical Systems Modeling of Biological Systems - Networks of coupled dynamical systems have been used to model biological oscillators, excitable media, neural networks, genetic control networks and many other self-organizing systems. In general, the connection topology is assumed to be either completely regular (e.g., diffusively-coupled system) or completely random. However, most biological networks lie somewhere between these two extremes. We would like to explore some simple models of networks that can be tuned through this middle ground . regular networks re-wired to introduce increasing amounts of disorder. These systems, called small-world networks, can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. For e.g., the neural network of the worm Caenorhabditis elegans is shown to exhibit the properties of small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. From the perspective of nonlinear dynamics, it would be interesting to understand how a network of interacting dynamical systems . be they neurons, chemical concentrations, or species population . behave collectively, given their individual dynamics and coupling architecture.

Related Publication: Top


Structural BioInformatics


1. Structural analysis of RNA molecules

Earlier analyses of structures of t-RNAs and ribozymes had shown that there are double helical stretches of nested base pairs contributing to secondary structures which are interlinked involving coaxial stacking along with structurally varied as well as functionally significant non helical elements such as hairpins, loops, internal bulges, multiple junctions and pseudo knots etc. The X-ray crystallographic structures of the 30S and the 50S ribosome fragments and more recently that of the complete ribosome have not only provided greater insights into the rules governing the structural expression of RNAs in the folded form, they have also exposed us to a plethora of non nested tertiary and neighbor interactions consisting of mainly non canonical base pairs, base triplets and even base quadruples etc. It is possible that detailed analysis of the relative base geometries and backbone torsions associated with these interactions can provide us with clues for also understanding the functional dynamics of RNA molecules. We are working on developing algorithms and tools for contextual mining and analyses of interacting motifs in functional RNA molecules.


Specific activities in this area include:
2. Ab initio quantum chemical computations of biomolecular interaction energies

Accurate determination of interaction energies in biomolecules using thermodynamics experiments is often extremely difficult since they normally require disruption in the biological environment. This is particularly true for bases interacting in their characteristically away from equilibrium geometries in complex RNA molecules. Accurate evaluation of energies associated with these interactions as a function of deviations from geometries defined by their local energy minima in isolated context should provide us with a handle for understanding the different forces contributing towards the stabilities of recurrent structural motifs as well as their functional dynamics in their biological context. We are using ab initio quantum chemical methods to study such interactions with the help of our in-house cluster and also the facilities of PARAM PADMA at CDAC Pune.



Specific activities in this area include:


3. Sequence - structure - function correlations in RNA molecules

In recent times the discovery of several ncRNA (non-coding RNA which does not code for proteins but directly performs structural, catalytic or regulatory functions) genes has drawn progressively greater scientific attention to RNA studies. Initially there were few anecdotal discoveries which had given rise to speculations that such 'RNA genes' are possible 'relics' of a 'primordial' RNA world when possibly RNA's participated both in genetic information transfer as well as in catalysis, before 'more efficient' protein enzymes evolved. Subsequent experimental as well as theoretical investigations on ncRNA genes have led to the targeted discovery of a variety of different ncRNAs such as miRNA, siRNA, snRNA, snmRNA, snoRNA, stRNA etc, and many of these have also been annotated with proven or putative functional roles. This has laid the basis for the hypothesis of the 'new RNA world', which supersedes the 'relic' hypothesis and speculates on the possibility that the genome encodes a large number of functional RNAs which participate in hitherto unknown and unexplored areas of genetic and metabolic pathways. It has become apparent today that these functional RNA molecules hold key secrets to life processes and that the near future will witness an explosion of new RNA gene sequences asking for structural and functional annotation and posing newer challenges to RNA structural bioinformatics. Structure - function correlations observed in proteins suggest that the understanding of the structure can go a long way in helping us understand the functions and mechanisms of action of the nc-RNAs. Given the difficulties associated with the experimental determination of RNA structures, improving the accuracy of structure prediction is considered very important. We are looking for efficient algorithms for RNA secondary structure prediction and also examining the possible role of energetics of non canonical base pairs in this context.


Specific activities in this area include:



4. Analysis of antigen binding interactions

Analysis of molecular interactions associated with antigen binding and their implications in terms of structure and dynamics is not only relevant to immunological research, it also provides us with insights into the basic principles of molecular recognition and effects of intermolecular interactions on structure. We are carrying out molecular modeling, docking and molecular dynamics studies on free and bound antigen binding molecules and exploring the potential of these methods in the design of immunological biosensors.



Specific activities in this area include:

5. Efficient algorithms for querying biological databases

Biological databases contain voluminous and diverse data related to sequence, structure and function data. Real time hypothesis driven research based on this data can benefit considerably from improved data structure and efficient customized algorithms for complex queries. We are working on computational geometric and graph theoretic approaches for efficient execution of complex queries.





Top


Chemical Dynamics

1. Collaborative research on Controlling vibrational excitations in Fe-C-O system in carbon monoxy myoglobin, with Prof Gabriel Balint-Kurti, School of Chemistry, University of Bristol, UK.

2. Paper presented in "Trends of Theoretical Chemistry" conference in Tiruchirapally, 10-11 Dec, 2006: "Optimal Fields to control molecular energetics" by S Sharma and P Kumar.

3. Collaborative research with Prof Biman Bagchi, Solid State and Structural Chemistry Unit, IISc, Bangalore on The quantum mechanics of Fluorescence Resonance Excitation transfer.

Top


Electronic Structure and Properties Calculations

1. Collaborative research with Dr T K Chakraborty, Indian Institute of Chemical Technology, Hyderabad on "Preferential trimerization of amino-methyl furan carboxylic acid" (M Sharma, P Kumar, H Singh and T K Chakraborty, J. Mol Str: THEOCHEM, 764 (2006) 109-115).

Top