SURVEY OF BIOMETRIC METHODOLOGIES FOR ANALYSIS OF HEALTH CARE RELATED DATA
The recent advancements in technology has helped us make great progress in increasing our understanding of genetics. Genomics is one such field which has helped answer many of the questions we have about genomes. Genomics has provided us with a ton of data and also provided many possibilities of using the same. But on the other hand the amount of data that it has generated has brought us face to face with the challenges we have around the storage and processing of this huge volume of data. To overcome these challenges there has been an increased focus on “bioinformatics” and “computational biology”. This paper gives an overview of “Bioinformatics” and how it can be used in analysis, exploration of biological data and to derive meaning from the generated data.
Keywords: Bioinformatics, GeneMark, Healthcare, pregnancy.
A huge amount of “omic” data is now available because of the recent technological advances in science. The challenge that still plagues professionals is the problem of generation and availability of this data in public databases. The challenge faced is how to make sense of this huge amount of data related to Structural data, sequences that are generated by biological systems. This necessitates the need to develop tools (both statistical and computational) which can help us understand the biological systems in depth.
This “new biology” era has emerged accosted by the development of other biological sciences such as Bioinformatics and Computational Biology. These fields of “Genomics” and “Bioinformatics” have developed interdependently and have created on impact on the above knowledge store. In this review we want to provide an overview of the various principles supporting bioinformatics such as “Biological information and databases”, “Molecular Modelling and Sequence Analysis”, “Genomic Analysis” and “Systems Biology”. We will look at the key points of these new techniques and also look at the tools to use in the analysis of data and interpretation of results obtained from these technologies.
Healthcare Informatics involves the use of information with the aid of technology to improve healthcare and advance biomedical research. It involves more information than technology. This field is a mix of computers, informatics and “life-health sciences” to make life better,
“Healthcare Informatics” involves in providing useful and right kind of information about patients at the right time to the right person to enable them to make the right decision regarding the treatment to be provided. This requires the information exchange between patients, doctors, hospitals and healthcare providers.
“Healthcare informatics is defined as the knowledge, skills and tools which enable information to be collected, managed, used and shared to support the delivery of healthcare and promote health”
Healthcare Informatics has helped transform the traditional healthcare system to the Information Era healthcare system and is a term formed as a mix of Information management, Electronic health, Telehealth and Medical Informatics (IM & T)
It combines Information Technology, Medical Areas, Healthcare Administration and healthcare Management and is the application of tech to solve health challenges faced in real life. An expert in Healthcare Informatics understand information technology and how to manage the demands and challenges faced by Medical Organizations.
Healthcare Informatics provides the essential tools to analyze data and extract knowledge to enable decision making. It uses Information Technology for the collection, management, processing and delivery of data to increase the performance of the organization and to improve service and business plans.
Let us consider a general use case of the above where a person has devices installed in a patients home which can be used to monitor the health condition, alert the health care professionals of any abnormal health condition and to also communicate the same to healthcare providers such as ambulances and hospitals.
This approach uses protein sequence to predict the 3D structure of protein. “Homology Modelling” is a method which uses a an existing structured or “crystallized protein” as a template to predict the protein structure. “MODELLER” is one such software used for this modelling. It makes use of “Protein Data Bank” which contains the 3D structures of a protein. This method of modelling can be used for prediction of the protein structure and construction of an atomic scale model of a protein from its sequence of amino acids. Like many other modelling techniques this also relies on the identification of multiple protein structures which are similar to the sequence and produce an alignment that maps the remaining sequences to the template sequence. The alignment and the template structure are used to produce a model. Similarity of sequences implies similarity of structure as well. This can be used for searching for genetic sequences.
Gene Finding or Gene Prediction is the area of biology that deals with identifying stretches of biologically functional genomic DNA using algorithms. “Comparitive Genomics” is the most reliable method for predicting the gene. The term Gene refers not only to DNA but also to RNA genes and regulatory regions. Gene prediction can help understand the genome of a species once the sequencing has been done.
In earlier days this was done using living cells and organisms and then statistical analysis was used to determine the order of genes on a certain chromosome. The information collected from many such experiments needed to be combined to create a “genetic map”.
Today with the sophisticated computation power and also with the considerable effort put into creating the genome sequence, “Gene Prediction of Finding” has now become a computation problem. Bioinformatics research is now making it possible to predict based on the sequence, the function of a gene.
There are many softwares available today for Gene Finding / Prediction such as the below
“Genemark” : it is recognized as one of the accurate and efficient tools for genome projects. It was the tool used for annotating the first complete sequencing of a bacteria “Haemophilus influenzae” and also the first complete sequencing of archaea “Methanoccoccus Jannaschii”. This software uses Markov Chain Models (Inhomogenous) of “protein coding” DNA Sequence which are species specific and uses Markov Chain Models (Homogenous) of “non coding” DNA.
“Genscan” : It is a “GHMM” based gene finder for human DNA sequences developed by “Chris Burge” , Mathematics Department, Standford University.
It is the technique of finding the right drug to treat a disease based on the biological target causing the same. The advancements in the availability of “proteomic”, “genomic” and “structural information” has helped identify such biological targets and has led to a large number of drug discoveries. This process involves what we call as “structure based drug design” where we design drugs based on the atomic structure of proteins and complexes. The way most of these drugs work is by binding with, interacting and modulating the activity of biological receptors specific to a disease / problem. These biological receptors are nothing but proteins which bind with other molecules and interact to perform the functions required for body to function as normal: Examples of such receptors are “Hormonic receptors”, ” neurotransmitter receptors” and “cell signaling receptors” . These functions of some of these receptors can get altered due to genetic abnormalities or stress (physical & physiological) thus impacting the normal health of a person.
The term “phylogenetics” is a combination of the Greek words “phyle / phylon” meaning “tribe / race” and “genetikos” meaning “relative to birth”. It is the study of the relationship between groups / species / populations of organisms based on which we can come up with a tree like branching diagram which can represent the relationships / inheritance among different molecules or organisms or both. It can be used to predict the genetic relation or evolutionary relation between organisms. There are many softwares available which are used in this field such as 1) “MEGA” and 2) “PAUP”. Most of these software uses the statistical methods such as “Maximum Likelihood” or “Maximum Parsimony”.
In a lot of cases Phylogenetics has been compared to Taxonomy which also deals with the classification of organisms based on how similar they are to each other. These fields overlap in the way they represent groups of related organisms / individuals based on their lineage. Evolution can be termed as a branching process whereas populations can often fork into a different branches and also end up becoming extinct in which case the branch terminates. The biggest challenge faced by phylogenetics is that though genetic data is now easily available it is only for the present, whereas it is very difficult to get fossil records which are in the past. In order to fill the gaps is where Evolution comes into the picture.
In the field of health care and monitoring now there is a special focus being made on the health of pregnant women. A pregnant women’s status and that of her foetus has to be continuously tracked, monitored throughout the term of pregnancy. There are many frameworks being created which can be used to track the status of the pregnant women and foetus and to determine any risks in which case there should be intimations sent to the doctor / relatives of the women , or they can also be sent to healthcare providers such as ambulance / hospital to provide urgent attention to the patient. This sort of a system can use multiple sensors / wearable devices to monitor the patient and use the internet to transfer this information to the right stakeholders so they can take the required actions. This system uses low cost sensors and mobile phone thus enabling this to be available for all strata’s of society. In addition to this today there is a ton of information available in the form of websites , apps etc which will help a mother to understand her health and alleviate the anxiety faced by such women during their pregnancy.
The Government of India has started several schemes such as “MCTS – Mother and Child Tracking System” which is a web application designed to connect the health care providers such as “Primary Health Centers (PHC)” and “Community Health Centers (CHC) to provide them with all of the required Pre Natal and Post Natal Care required.
AN OVERVIEW OF DATA INTEGRATION: THE USE OF NETWORKS
Just as there are nodes on a computer network that has its own functions and which interacts with other nodes to perform a higher level function, we can also treat genes as individual nodes which when they interact with other genes perform a certain biological function such as coughing / sneezing.
Our human body consists of many such networks “genes”, “molecular” and “cellular” that interact and communicate with each other at different planes of interaction.
In order to identify such biological networks we need to use methods such as “genetic coexpression networks” and “Multistage analysis methods”. Here we assume that all genes are connected in a network and the strength of the connection between each is directly connected to the correlation between them. The connectivity that a gene has to other genes in the network defines the importance of the gene.
Bioinformatics as we have learnt deals with the collection, analysis and interpretation of biological data to provide better insights into the workings of the human body and the impact on it due to multiple reasons such as infection , genetics etc.
Though initially Bioinformatics was more about the analysis of DNA sequences, it has now due to the recent development of Bioinformatics resulted in generation and collection of more data. Based on Bioinformatics we can identify the functions of each and every gene in our body but still there are a large number of genes whose sequence / function is still unknown thus creating a challenge to the current approaches being used. In order to overcome these limitations we have to look towards technology such as building more powerful processors using Nano fabrication tecniques. Other challenges faced by this domain is the analysis techniques required to analyze such large amount of data and provide predictions.
Microarays are used to compare the relative amounts of RNA between 2 samples. These microarrays can be divided into “complementary DNA (cDNA)” or ” oligonucleotide microarray”. Microarrays provide the means to repeatedly measure multiple genes at once.
1 Wullianallur R. and Viju R. 2014. “Big data analytics in healthcare: promise and potential. Health Information” Science and Systems, 2:3
2 Knowledgent 2012. Big data and healthcare payers. White Paper by Knowledgent Innovation through Information. Knowledgent White Paper Series
3 Yang C, Li C, Wang Q, Chung D, Zhao H. Implications of pleiotropy: challenges and opportunities for mining big data in biomedicine. Front Genet 2015;6:2294. http://salilab.org/modeller/
5. http://genes.mit.edu/GENSCAN.html6. www.megasoftware.net
8 Friedman, C. 2009. “A fundamental theorem of biomedical informatics”. Journal of the American Medical Informatics Association, 16: 169?170.
9 US Department of Health. 2002. Making information count: a human resources strategy for health informatics professionals. USA: Department.
10 Hersh W. 2009. “A stimulus to define informatics and health information technology”. BMC Med Inform Decision Making 9:24
11 In Y. C., Tae-Min K., Myung S. K., Seong K. M. and Yeun-Jun C. 2013. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care. Genomics and Informatics. Published online by Korea Genome Organization.
12 J. Maniam, C. K. Chin, and K. Chenapiah, Mobile phone based pregnancy support system. eHealth, 2007.
13 B. Amoah, E. A. Anto, and A. Crimi, “Phone-based prenatal care for communities and remote ultrasound imaging,” MobMed Prague, 2014.
14 J. Osma, I. Plaza, E. Crespo, C. Medrano, and R. Serrano, “Proposal of use of smartphones to evaluate and diagnose depression and anxiety symptoms during pregnancy and after birth,” in Biomedical and Health Informatics (BHI), 2014 IEEE-EMBS International Conference
on, pp. 547–550, IEEE, 2014.
15 MCTS: An in-depth assessment of India’s Mother Child Tracking System (MCTS) in Rajasthan and Uttar Pradesh: Journal of Health, Population and Nutrition, PMCID: PMC4530478.