News & Events

Dec. 17, 2020

Cholera outbreaks predicted using climate data and AI

Climate data taken from Earth orbiting satellites, combined with machine learning techniqu

Cholera is a waterborne disease caused by the ingestion of water or food contaminated with the bacterium Vibrio cholerae, which can be found in many coastal regions around the world, especially in densely populated tropical areas. The responsible pathogen generally lives under warm temperatures, moderate salinity and turbidity, and can be harboured by plankton and detritus in the water.

Number of cholera outbreaks

Global warming and an increase in extreme weather events are driving outbreaks of cholera – a disease that affects 1.3 to 4 million people each year worldwide and causes up to 143 000 fatalities. A new study shows how cholera outbreaks in coastal regions of India can be predicted with an 89% success rate, in the first demonstration of using sea surface salinity for forecasting cholera.

The research paper published in the International Journal of Environmental Research and Public Health focuses on predicting outbreaks of cholera around the northern Indian Ocean, where more than half of global cases of the disease were reported in the 2010-16 period.

The relationship between the environmental drivers of cholera incidence are complex, and vary seasonally, with different lagged effects, for example from the monsoon season. Machine learning algorithms can help to overcome these issues by learning to recognise patterns across large datasets in order to make testable predictions.

The study was led by Amy Campbell during a year-long graduate traineeship with the ESA Climate Office. Amy, along with her co-authors at the Plymouth Marine Laboratory (PML), used a machine learning algorithm popular in environmental science applications – the random forest classifier – which can recognise patterns across long datasets and make testable predictions.

The algorithm was trained on disease outbreaks reported in coastal districts in India between 2010 and 2018, and learned the relationships with six satellite-based climate records generated by ESA’s Climate Change Initiative (CCI) and are freely available via its Open Data Portal.

Performance metrics results

By including or removing environmental variables and sub-setting for different seasons, the algorithm identified key variables for predicting cholera outbreaks as land surface temperature, sea surface salinity, chlorophyll-a concentration and sea level difference from average (sea level anomaly).

Amy Campbell said, “The model showed promising results, and there’s a lot of scope for developing this work using different cholera surveillance datasets or in different locations. In our study, we tested different machine learning techniques and found the random forest classifier to be the best, but there are far more techniques that could be investigated.

“It would be interesting to test the impact of including socio-economic datasets; remote sensing data could be used to develop records to account for human factors that are important for cholera incidence, such as access to water resources.” 

The study and its new insights have contributed to the UKRI-NERC Pathways Of Dispersal for Cholera And Solution Tools (PODCAST) Project led by co-author Marie-Fanny Racault at PML, which is assessing the impact of climate warming and climate extremes on habitats suitable for Vibrio cholerae

The results from the study will be demonstrated at the UNFCCC’s COP26 meeting in 2021 via a web-based forecasting tool as part of the PODCAST-DEMO project. This is supported by the ESA-Future Earth joint programme and carried out in collaboration with Future Earth’s Health Knowledge-Action network.