Google harnesses historical news data and AI to forecast flash floods

Flash floods rank as some of the planet's most lethal weather phenomena, claiming over 5,000 lives annually. They are also notoriously challenging to forecast. Google, however, believes it has found a novel solution to this problem: by analyzing news reports.
While humanity has amassed vast amounts of meteorological data, flash floods are too brief and localized to be systematically measured, unlike temperature or river flow trends tracked over time. This data deficiency hinders deep learning models, even as they grow more adept at general weather prediction, from accurately forecasting flash floods.
To bridge this gap, Google researchers employed Gemini—the company's large language model—to sift through 5 million global news articles. The process identified reports of 2.6 million distinct flood events, converting them into a geotagged timeline named "Groundsource." According to Gila Loike, a Google Research product manager, this marks the company's first use of language models for such a purpose. The research and data set were made public on Thursday morning.
Using Groundsource as a real-world benchmark, the team trained a model based on a Long Short-Term Memory (LSTM) neural network. This model processes global weather forecasts to generate probability estimates for flash floods in specific locations.
Google's flash flood forecasting model now identifies risks for urban areas across 150 countries on its Flood Hub platform and shares this data with emergency response agencies worldwide. António José Beleza, an emergency response official with the Southern African Development Community who participated in a trial of the model, stated it significantly improved his organization's response speed to flood events.
The model does have current limitations. Its resolution is relatively broad, assessing risk across 20-square-kilometer zones. It is also less precise than systems like the US National Weather Service's flood alerts, partly because it does not integrate local radar data for real-time precipitation tracking.
A key objective of the project, however, was to create a solution for regions where local governments lack the resources for costly weather-sensing infrastructure or detailed historical meteorological records.
"By aggregating millions of reports, the Groundsource data set helps rebalance the global map," explained Juliet Rothenberg, a program manager on Google's Resilience team. "It allows us to extrapolate insights to other regions with far less available information."
Rothenberg added that the team hopes this methodology—using LLMs to create quantitative data sets from qualitative, written sources—can be applied to other ephemeral but critical forecasting challenges, such as heat waves and mudslides.
Marshall Moutenot, CEO of Upstream Tech, a firm using similar deep learning for river flow forecasts for clients like hydropower companies, views Google's work as part of a broader push to compile data for AI-driven weather models. Moutenot also co-founded dynamical.org, a group curating machine learning-ready weather data for researchers and startups.
"Data scarcity remains one of the toughest hurdles in geophysics," Moutenot noted. "Paradoxically, there's an overabundance of Earth data, yet a shortage of validated ground truth for evaluation. This was a highly creative approach to acquiring that essential data."
Related article
Sandberg and Clegg Join Nscale Board as 'Stargate Norway' Startup Hits $14.6B Valuation
As demand surges for data centers capable of delivering AI compute at scale, Nscale, a British AI infrastructure company backed by Nvidia, has reached a valuation of $14.6 billion. That positions it as one of Europe's newest decacorns, alongside Hels
Runway's $5.3B Valuation Challenges Google as Video AI Surpasses Language
While most AI giants have poured billions into language models, generative AI video startup Runway is charging ahead on a very different path. According to TechCrunch, this young company—founded by art school graduates—has now reached a valuation of
Google to Boost Investment in Anthropic, Potential Total up to $40 Billion
In the fast-paced AI arms race, major tech players are making increasingly bold moves. According to the latest reports, Google plans to invest up to $10 billion in AI startup Anthropic—and that's just the start. Under its long-term strategy, the tota
Related Special Topic Recommendations
Comments (2)
0/500
So Google's basically using old news articles to predict floods? That's wild. I mean, if it works, it could save thousands of lives, but I wonder how much historical data is actually reliable when weather patterns are changing so fast. 🌊🤔

Flash floods rank as some of the planet's most lethal weather phenomena, claiming over 5,000 lives annually. They are also notoriously challenging to forecast. Google, however, believes it has found a novel solution to this problem: by analyzing news reports.
While humanity has amassed vast amounts of meteorological data, flash floods are too brief and localized to be systematically measured, unlike temperature or river flow trends tracked over time. This data deficiency hinders deep learning models, even as they grow more adept at general weather prediction, from accurately forecasting flash floods.
To bridge this gap, Google researchers employed Gemini—the company's large language model—to sift through 5 million global news articles. The process identified reports of 2.6 million distinct flood events, converting them into a geotagged timeline named "Groundsource." According to Gila Loike, a Google Research product manager, this marks the company's first use of language models for such a purpose. The research and data set were made public on Thursday morning.
Using Groundsource as a real-world benchmark, the team trained a model based on a Long Short-Term Memory (LSTM) neural network. This model processes global weather forecasts to generate probability estimates for flash floods in specific locations.
Google's flash flood forecasting model now identifies risks for urban areas across 150 countries on its Flood Hub platform and shares this data with emergency response agencies worldwide. António José Beleza, an emergency response official with the Southern African Development Community who participated in a trial of the model, stated it significantly improved his organization's response speed to flood events.
The model does have current limitations. Its resolution is relatively broad, assessing risk across 20-square-kilometer zones. It is also less precise than systems like the US National Weather Service's flood alerts, partly because it does not integrate local radar data for real-time precipitation tracking.
A key objective of the project, however, was to create a solution for regions where local governments lack the resources for costly weather-sensing infrastructure or detailed historical meteorological records.
"By aggregating millions of reports, the Groundsource data set helps rebalance the global map," explained Juliet Rothenberg, a program manager on Google's Resilience team. "It allows us to extrapolate insights to other regions with far less available information."
Rothenberg added that the team hopes this methodology—using LLMs to create quantitative data sets from qualitative, written sources—can be applied to other ephemeral but critical forecasting challenges, such as heat waves and mudslides.
Marshall Moutenot, CEO of Upstream Tech, a firm using similar deep learning for river flow forecasts for clients like hydropower companies, views Google's work as part of a broader push to compile data for AI-driven weather models. Moutenot also co-founded dynamical.org, a group curating machine learning-ready weather data for researchers and startups.
"Data scarcity remains one of the toughest hurdles in geophysics," Moutenot noted. "Paradoxically, there's an overabundance of Earth data, yet a shortage of validated ground truth for evaluation. This was a highly creative approach to acquiring that essential data."
Sandberg and Clegg Join Nscale Board as 'Stargate Norway' Startup Hits $14.6B Valuation
As demand surges for data centers capable of delivering AI compute at scale, Nscale, a British AI infrastructure company backed by Nvidia, has reached a valuation of $14.6 billion. That positions it as one of Europe's newest decacorns, alongside Hels
Runway's $5.3B Valuation Challenges Google as Video AI Surpasses Language
While most AI giants have poured billions into language models, generative AI video startup Runway is charging ahead on a very different path. According to TechCrunch, this young company—founded by art school graduates—has now reached a valuation of
Google to Boost Investment in Anthropic, Potential Total up to $40 Billion
In the fast-paced AI arms race, major tech players are making increasingly bold moves. According to the latest reports, Google plans to invest up to $10 billion in AI startup Anthropic—and that's just the start. Under its long-term strategy, the tota
So Google's basically using old news articles to predict floods? That's wild. I mean, if it works, it could save thousands of lives, but I wonder how much historical data is actually reliable when weather patterns are changing so fast. 🌊🤔





Home






