Portrait - Thomas Balezeau, data engineer


He manipulates computer programs to collect and analyze the data masses owned by Institut Curie. The aim is to make the data easier to use for researchers and physicians, using simple applications and tools.

Thomas Balezeau

When he arrived at Institut Curie two and a half years ago, this trained bio-computer scientist literally dove into the world of big data: "I joined Julien Guérin's team to build a data warehouse. It was a great challenge; we embarked on an adventure made possible by the momentum of the ICGEx project, and Institut Curie was a pioneer in the field!"  His goal To place a large mass of data in a single database that can be easily searched by users. 

Although the initial database contained only data from biological samples (such as blood and DNA), it very soon became a vast clinical data warehouse that centralizes virtually all medical reports, chemotherapy and radiotherapy data, etc. Thomas Balezeau then set to work to develop programs to automate the data flow. "The idea was to put it all in the same place, in a uniform manner, in order to make it accessible and satisfy all of the researchers' questions and needs." This data warehouse (BIOMEDICS) today is still supplied and used in addition to another project led by UNICANCER, and in which Thomas Balezeau is involved, entitled ConSoRe, a reference to the research-care continuum. This powerful search engine is able to collect and analyze information disseminated in text form in medical records, and this in a fraction of a second.

An enormous task: ConSoRe searches millions of patient documents and indexes them with medical terms (and millions of synonyms). Unlike BIOMEDICS, which is for advanced data processing, ConSoRe is also a user-friendly interface for its users, so that they can ask their questions as part of their research projects.

The idea is to allow physicians and researchers to ask as many questions as they wish, as complex as they may be, via an application suited to their needs and uses. And to provide them this data rapidly.

Give meaning to the data Using these very powerful tools, it is now possible to reconstruct a patient's medical history according to precise criteria (cancer, family risk, treatment, etc.). This is where the final goal lies, explains Thomas, translating this data into knowledge and correlating it. He continues: "I recently obtained a research project to look in-depth at this clinical data in order to identify correlations, which I hope will prove surprising." The true aim is to reveal facts that were until now invisible. For example, by comparing similar patient profiles. The idea is to understand whether certain combinations of treatments and other clinical variables could play a part in the response to treatment.

Organize and plan This data analysis requires an enormous amount of classification and referencing. "In the future it will be useful to be able to exchange with other research engines." Another challenge is the management of data architecture. Thomas Balezeau works in particular on international projects to analyze images using artificial intelligence.

We have terabytes of data, and managing them is problematic in terms of storage and anonymization. Processing it takes time and space, and raises a number of questions, such as where do we store these images? how do we secure data? how do we share it? 

Another area of discussion is industrializing processes to improve service quality and - why not - satisfying all needs.

It is not a simple task since the processes are as varied as they are subtle. But it's a fascinating subject!