Data Overview

The datasets presented here have been divided into three categories: Output data, Source data, and Intermediate data. The Berkeley Earth averaging process generates a variety of Output data including a set of gridded temperature fields, regional averages, and bias-corrected station data. Source data consists of the raw temperature reports that form the foundation of our averaging system. Source observations are provided as originally reported and will contain many quality control and redundancy issues. Intermediate data is constructed from the source data by merging redundant records, identifying a variety of quality control problems, and creating monthly averages from daily reports when necessary. The definitive repository for Source and Intermediate data is located in the SVN, which is built nightly.

Berkeley Earth’s data is licensed under Creative Commons BY-NC 4.0 International for non-commercial use only. For licensing in commercial applications, please contact



Output Files:

Time Series Data

Land Only (1750 – Recent)

Berkeley Earth’s primary product is an analysis of summary air temperatures over land. The following files and links provide time series that summarize those results for various regions.

Land + Ocean (1850 – Recent)

Berkeley Earth combines our land data with a modified version of the HadSST ocean temperature data set. The result is a global average temperature data set.

Daily Land (Experimental; 1880 – Recent)

This data set is an experimental temperature time series with daily resolution.

Raw Land Only Monthly (1750 – Recent)

Berkeley Earth’s main analysis uses a variety of techniques to detect and compensate for systematic problems with the source data.  For example, measurement discontinuities often occur when observing stations are moved or instrumentation is change.  However, for diagnostic and pedagogical reasons it can be useful to look at averages when many of the quality control and homogenization features of the Berkeley Earth algorithm have been disabled.  The following files provide time series that summarize the raw data, omitting most of the bias and error detection steps.

Gridded Data

Datasets are also provided in a gridded NetCDF format. Two types of grids are provided, a grid based on dividing the Earth into 15984 equal-area grid cells and a latitude-longitude grid. The equal area grid is the primary data format used in most of our analyses and provides generally smaller files; however, that format may be less convenient for many users.

Datasets marked as “Experimental” below are new products in a late stage of development, and are included here so that potential users can give us feedback.

For inquiries about this data please contact

Breakpoint Adjusted Monthly Station data

During the Berkeley Earth averaging process we compare each station to other stations in its local neighborhood, which allows us to identify discontinuities and other heterogeneities in the time series from individual weather stations. The averaging process is then designed to automatically compensate for various biases that appear to be present. After the average field is constructed, it is possible to create a set of estimated bias corrections that suggest what the weather station might have reported had apparent biasing events not occurred. This breakpoint-adjusted data set provides a collection of adjusted, homogeneous station data that is recommended for users who want to avoid heterogeneities in station temperature data.

Source Data

The source files we used to create the Berkeley Earth database are available in a common format.

Intermediate Data


This includes all time series from the originating datasets. Due to duplication with the same data being reported by multiple agencies, on average there will be 3-4 time series reports with each site. Only limited quality control flagging has been performed at this stage.

Single Valued

Data have been collapsed so that there is only one time series per location. Quality control procedures have been completed and their output is reported via a series of quality “flags”. Users of this data set will have to consider these flags and remove any data they don’t want to use. Seasonality is preserved in this data set.

Quality Controlled

Same as “Single-valued” except that all values flagged as bad via the quality control processes have been removed. This dataset is recommended for users that require relatively clean data. However, no adjustments have been made for heterogeneous and other biasing events. Please consider the breakpoint adjusted station data above if you wish to avoid heterogeneity.

We have updated our Privacy Policy to reflect the use of personalized advertising cookies placed on our website. By continuing to use our site, you acknowledge that you accept our Privacy Policy.

I accept