How the Bard’s writing helps with cancer diagnoses

Jul 22 2013

Prof Pablo Moscato

Prof Pablo Moscato says the Bard’s writing and biological data can be similarly measured.

The computer science techniques used to analyse the writing style in Shakespeare plays can also be applied to cancer diagnostic procedures, according to the co-leader of HMRI’s Information Based Medicine program, Professor Pablo Moscato.

“It’s all down to measuring the information present in large amounts of data”, Professor Moscato says. “At a mathematical level, both problems have surprisingly similar characteristics.”

In results published recently in the international journal PLoS ONE and featured in The Conversation, researchers observed that Shakespeare’s work was remarkable for its probability of common word usage and its closeness to overall average use of words at the time.

“We wondered if it was possible to find distinctive writing signatures of individual authors by looking at fluctuations in the frequencies of words used,” Professor Moscato said. “This is, in some sense, pretty analogous to our work in biomarker identification in cancer and neurodegeneration.”

HMRI’s Information Based Medicine program is a pioneer in developing new methods to identify biomarkers of disease. Biomarkers are measurable indicators of biological processes, including patient responses to a therapeutic intervention, and are important for cancer diagnostics and early screening.

“Controversies exist about the use of a single biomarker, as in the case of PSA for prostate cancer, so current medical research advocates finding panels of biomarkers,” Professor Moscato said. “To identify panels it is important to find the best combination of biomarkers among the tidal wave of data present.

“Our team uses combinatorial optimisation to subtype different types of cancers at the molecular level by analysing patterns of variations across different samples.”

In this new contribution, they introduced a simple scoring method that helps to conduct an initial screening of a dataset before more sophisticated methods are used.

“It has helped to identify potentially mislabelled samples and other pitfalls during early processing of data. For more in-depth investigations we use supercomputing-based approaches with our existing infrastructure at the HMRI Building.”

* Professor Moscato is Professor in Computer Science and Software Engineering and Chief Investigator of the ARC Centre of Excellence in Bioinformatics. He founded the Newcastle Bioinformatics Initiative in 2002 and co-founded the Priority Research Centre in Bioinformatics, Biomarker Discovery and Information-based Medicine in 2006. HMRI is a partnership between Hunter New England Health, the University of Newcastle and the community.