Managing Director, Global Insurance & Risk Management Solutions
It may be surprising to know that U.S. natural catastrophe economic losses totaled $119 billion in 2020, and 75% (or $89.4B) of those economic losses were caused by severe storms and cyclones. In the insurance industry, data is everything. Insurers use data to influence underwriting, rating, pricing, forms, marketing, and even claims handling. When fueled by good data, risk assessments become more accurate and produce better business results. To make this possible, the industry is increasingly turning to predictive analytics, which uses data, statistical algorithms, and machine learning (ML) techniques to predict future outcomes based on historical data. Insurance firms also integrate external data sources with their own existing data to generate more insight into claimants and damages. Google Cloud Public Datasets offers more than 100 high-demand public datasets through BigQuery that helps insurers in these sorts of data “mashups.”
One particular dataset that insurers find very useful is Severe Storm Event Details from the U.S. National Oceanic and Atmospheric Administration (NOAA). As part of the Google Cloud Public Datasets program and NOAA Big Data Program, this severe storm data contains various types of storm reports by state, county, and event type—from 1950 to the present—with regular updates. Similar NOAA datasets within the Google Cloud Public Datasets program include the Significant Earthquake Database, Global Hurricane Tracks, and the Global Historical Tsunami Database.
In this post, we’ll explore how to apply storm event data for insurance pricing purposes using a few common data science tools—Python Notebook and BigQuery—to drive better insights for insurers.
For property insurers, common determinants of insurance pricing include home condition, assessor and neighborhood data, and cost-to-replace. But macro forces such as natural disasters—like regional hurricanes, flash floods, and thunderstorms—can also significantly contribute to the risk profile of the insured. Insurance companies can leverage severe weather data for dynamic pricing of premiums by analyzing the severity of those events in terms of past damage done to property and crops, for example.
It’s important to set the premium correctly, however, considering the risks involved. Insurance companies now run sophisticated statistical models, taking into account various factors—many of which can change over time. After all, without accurate data, poor predictions can lead to business losses, particularly at scale.
The Severe Storm Event Details database includes information about a storm event's location, azimuth (an angle measurement used in celestial coordination), distance, impact, and severity, including the cost of damages to property and crops. It documents:
Data about a specific event is added to the dataset within 120 days to allow time for damage assessments and other analysis.
Damage caused by the storms in the past five years by stateGoogle Cloud’s BigQuery provides easy access to this data in multiple ways. For example, you can query directly within BigQuery and perform analysis using SQL.
Another popular option in the data science and analyst community is to access BigQuery from within the Notebook environment to intersperse Python code and SQL text, and then perform ad hoc experimentation. This uses the powerful BigQuery compute to query and process huge amounts of data without having to perform the complex transformations within the memory in Pandas, for example.
In this Python notebook, we have shown how the severe storm data can be used to generate risk profiles of various zip codes based on the severity of those events as measured by the damage incurred. The severe storm dataset is queried to retrieve a smaller dataset into the notebook, which is then explored and visualized using Python. Here’s a look at the risk profiles of the zip codes:
Clusters of Zip codes by number of storms and damage cost.Another Google Cloud resource for insurers is BigQuery ML, which allows them to create and execute machine learning models on their data using standard SQL queries. In this notebook, with a K-Means Clustering algorithm, we have used BigQuery ML to generate different clusters of zip codes in the top five states impacted by severe storms. These clusters show different levels of impact by the storms, indicating different risk groups.
The example notebook is a reference guide to enable analysts to easily incorporate and leverage public datasets to augment their analysis and streamline the journey to business insights. Instead of having to figure out how to access and use this data yourself, the public datasets, coupled with BigQuery and other solutions, provide a well-lit path to insights, leaving you more time to focus on your own business solutions.
Google Cloud’s Public Datasets is just one resource within the broader Google Cloud ecosystem that provides data science teams within the financial services with flexible tools to gather deeper insights for growth. The severe storm dataset is a part of our environmental, social, and governance (ESG) efforts to organize information about our planet and make it actionable through technology, helping people make a positive impact together.
To learn more about this public dataset collaboration between Google Cloud and NOAA, attend the Dynamic Pricing in Insurance: Leveraging Datasets To Predict Risk and Price session at the Google Cloud Financial Services Summit on May 27. You can also check out our recent blog and explore more about BigQuery and BigQuery ML.
Data AnalyticsSee how you can use NOAA’s environmental datasets on Google Cloud to explore environmental and historical data, including whale calls, satellite images, and more.
By Jonathan O'Neil • 4-minute read