Dikla Dotan-Cohen

Computer Science, BGU


I am a Ph.D student in the Computer Science department of Ben-Gurion University in Israel.

image example

My advisor is Prof. Avraham Melkman.
The title of my thesis is:
Improving the analysis of large scale data-sets by integrating gene annotations

My thesis focuses on three topics:
1. Analysis of networks (graphs), in which objects form the vertices of a graph and their labels (e.g. features) are used to infer connections between communities of objects or to characterize objects with special desired properties.
2. Clustering problem, in which the spatial data of objects are to be integrated with labels of the objects.
3. Statistical enrichment analysis, in which a study-set of objects is characterized with respect to the population-set.

image link example

My research is in the field of Bioinformatics.
The complete sequencing of the human and other genomes has ushered in a new era of systems biology. Whole organelles and pathways can now be studied simultaneously, rather than one gene or protein on account of high throughput technologies that can generate massive biological data relatively easy. However, bioinformatics, statistics, and data mining methods are required in order to analyze the obtained raw data to make it valuable. Gene annotations such as the gene biological function, its cellular localization, the regulatory elements dominating its expression etc., can be integrated into the computational analysis to improve the derived biological conclusions. The purpose of our research is to generate novel integration techniques that.
The improved analysis techniques were implemented on three types of large biological data-sets. The first is mRNA expression data, obtained from microarray experiments, in which mRNA expression levels of thousands of genes are measured simultaneously in a cell or tissue sample under specific conditions. The second lists the known protein-protein physical interactions and the third lists the known genetic interactions between pairs of genes. These data-sets have some features in common: they are too large for manual analysis, they are often very noisy and most important for this thesis - they are all known to be correlated with functional annotations of the genes. Integration of functional annotations into their analysis should therefore be beneficial.
The gene annotations resource that will be discussed here is the ‘biological process’ domain from the Gene-Ontology (GO), a major bioinformatics initiative that aims to unify the representation of gene and gene product attributes across all species. GO is structured as a directed acyclic graph, and each term has defined relationships to one or more other terms in the same domain. In the studies presented in this thesis, the GO annotations are treated as labels, assigned to the different genes or proteins. Moreover, some of the studies consider also the relationships between the different annotations.