The article under review is “Bioinformatics Challenges for Genome-Wide Association Studies” by J.H. Moore, F.W. Asselbergs, and S.M. Williams (2010). This resource is a literature review which focuses on genome-wide association studies (GWAS). The usefulness of GWAS emerged from the sequencing of the human genome. The sequencing of the human genome enabled researchers to identify more than a million single nucleotide polymorphisms (SNPs). Consequently, the identification of these SNPs enabled researchers to conduct GWAS. This in turns generated significant amounts of GWAS data. As a consequence of the sheer amount of data, Moore, Asselbergs, & Williams (2010) note that it was necessary to create new biostatistical methods in order to promote quality control, imputation, and analysis, particularly with regard to multiple testing.
As a consequence of the development of these new methods, many new associations have been identified which have been successfully replicated across several studies. Unfortunately, the SNPs located in the course of GWAS have not produced meaningful data regarding disease susceptibility. This calls into question their usefulness with regard to genetic testing and its ability to enhance health care. This is accounted for as a consequence of biostatistical analysis paradigms which currently govern testing and experimentation. Furthermore, according to Moore, Asselbergs, & Williams (2010), the linearity of accepted methods means that only one SNP is considered at a time which fails to take into account factors such as environment and genomic context, as well as discounting accepted information regarding disease pathobiology.
In addition to these problems with current methods, there is a trend toward more holistic approaches as opposed to traditional biostatistical approaches. These holistic approaches take into account the aforementioned factors, i.e., genomic and environmental factors. These holistic methods account for genotypic/phenotypic dynamics in a way that biostatistical methods fail to acknowledge. The authors argue that despite these aforementioned weaknesses, bioinformatics are still relevant to the process of exploring the genetic foundations of human diseases. Therefore, the authors identify particular GWAS which necessitate computational methods versus holistic approaches.
This well-organized article examines both sides of the challenge in bioinformatics: the challenge of the bio part – in this case, SNPs and GWAS – and the informatics part – in this case, the computational or analysis aspects of the studies. Moore, Asselbergs, and Williams (2010) make a clear connection between data mining and machine learning and genetic studies, especially in regards to modeling and the various ways in which it can be used in GWAS. It explores algorithms as well, clearly explaining how they are used and how they facilitate analysis. Yet, the authors do not neglect what can be done with the data generated by the aforementioned methods; they cover the generation of biological databases and data sets. The authors critique the problems associated with software in these contexts, noting that the creation of truly effective and useful software for bioinformatics uses will require close collaboration between software designers, biologists, and biostaticians. Though self-evident, it is clear from the article that current tools are not necessarily as well-structured to account for the many dimensions of genetic research.
The article concludes with an illustration of a bioinformatics analysis strategy which answers the many challenges identified in the review. It underlines the usefulness of the information the authors cover and simplifies it without losing any of the elements which the review has explored. In conclusion, the article is well-written, if dense, but offers solutions to the problems it identifies. It manages the many dimensions of bioinformatics well and effectively.