The datasets presented here have been divided into three categories: Output data, Source data, and Intermediate data. The Berkeley Earth averaging process generates a variety of Output data including a set of gridded temperature fields, regional averages, and bias-corrected station data. Source data consists of the raw temperature reports that form the foundation of our averaging system. Source observations are provided as originally reported and will contain many quality control and redundancy issues. Intermediate data is constructed from the source data by merging redundant records, identifying a variety of quality control problems, and creating monthly averages from daily reports when necessary. The definitive repository for Source and Intermediate data is located in the SVN, which is built nightly.
Output Files:
Time Series Data
Land Only (1750 – Recent)
Berkeley Earth’s primary product is an analysis of summary air temperatures over land. The following files and links provide time series that summarize those results for various regions.
- All Land:
- Regional Average Temperature:
Land + Ocean (1850 – Recent)
Berkeley Earth combines our land data with a modified version of the HadSST ocean temperature data set. The result is a global average temperature data set.
Daily Land (Experimental; 1880 – Recent)
This data set is an experimental temperature time series with daily resolution.
Raw Land Only Monthly (1750 – Recent)
Berkeley Earth’s main analysis uses a variety of techniques to detect and compensate for systematic problems with the source data. For example, measurement discontinuities often occur when observing stations are moved or instrumentation is change. However, for diagnostic and pedagogical reasons it can be useful to look at averages when many of the quality control and homogenization features of the Berkeley Earth algorithm have been disabled. The following files provide time series that summarize the raw data, omitting most of the bias and error detection steps.
- Raw Monthly Average Temperature (raw annual summary)
- Raw Monthly Average High Temperature (raw annual summary)
- Raw Monthly Average Low Temperature (raw annual summary)
Gridded Data
Datasets are also provided in a gridded NetCDF format. Two types of grids are provided, a grid based on dividing the Earth into 15984 equal-area grid cells and a latitude-longitude grid. The equal area grid is the primary data format used in most of our analyses and provides generally smaller files; however, that format may be less convenient for many users.
Datasets marked as “Experimental” below are new products in a late stage of development, and are included here so that potential users can give us feedback.
For inquiries about this data please contact admin@berkeleyearth.org.
- Monthly Land
- Average Temperature (TAVG; 1753 – Recent)
- Video of Temperature Field
- Equal Area (~45 MB)
- 1º x 1º Latitude-Longitude Grid (~200 MB)
- High Temperature (TMAX; 1833 – Recent)
- Equal Area (~35 MB)
- 1º x 1º Latitude-Longitude Grid (~140 MB)
- Low Temperature (TAVG; 1833 – Recent)
- Equal Area (~35 MB)
- 1º x 1º Latitude-Longitude Grid (~140 MB)
- README
- Average Temperature (TAVG; 1753 – Recent)
- Monthly Land + Ocean
- Average Temperature with Air Temperatures at Sea Ice (Recommended; 1850 – Recent)
- Video of Temperature Field
- Equal Area (~100 MB)
- 1º x 1º Latitude-Longitude Grid (~400 MB)
- Average Temperature with Water Temperatures at Sea Ice (1850 – Recent)
- Equal Area (~100 MB)
- 1º x 1º Latitude-Longitude Grid (~400 MB)
- README
- Average Temperature with Air Temperatures at Sea Ice (Recommended; 1850 – Recent)
- Higher Resolution Land Data Sets (experimental, not recently updated)
- Contiguous United States Average Temperature (TAVG; 1850 – Recent)
- Europe Average Temperature (TAVG; 1850 – Recent)
- Daily Land (Experimental; 1880 – Recent)
- Average Temperature (TAVG)
- Average High Temperature (TMAX)
- Average Low Temperature (TMIN)
- README
- Raw Monthly Land (Diagnostic)
- Average Temperature (TAVG; 1753 – Recent)
- Equal Area (~45 MB)
- 1º x 1º Latitude-Longitude Grid (~200 MB)
- High Temperature (TMAX; 1833 – Recent)
- Equal Area (~35 MB)
- 1º x 1º Latitude-Longitude Grid (~140 MB)
- Low Temperature (TAVG; 1833 – Recent)
- Equal Area (~35 MB)
- 1º x 1º Latitude-Longitude Grid (~140 MB)
- Average Temperature (TAVG; 1753 – Recent)
Breakpoint Adjusted Monthly Station data
During the Berkeley Earth averaging process we compare each station to other stations in its local neighborhood, which allows us to identify discontinuities and other heterogeneities in the time series from individual weather stations. The averaging process is then designed to automatically compensate for various biases that appear to be present. After the average field is constructed, it is possible to create a set of estimated bias corrections that suggest what the weather station might have reported had apparent biasing events not occurred. This breakpoint-adjusted data set provides a collection of adjusted, homogeneous station data that is recommended for users who want to avoid heterogeneities in station temperature data.
- Individual Station Data
- Complete Station Archives:
Source Data
The source files we used to create the Berkeley Earth database are available in a common format.
Intermediate Data
Multi-valued
This includes all time series from the originating datasets. Due to duplication with the same data being reported by multiple agencies, on average there will be 3-4 time series reports with each site. Only limited quality control flagging has been performed at this stage.
Single Valued
Data have been collapsed so that there is only one time series per location. Quality control procedures have been completed and their output is reported via a series of quality “flags”. Users of this data set will have to consider these flags and remove any data they don’t want to use. Seasonality is preserved in this data set.
Quality Controlled
Same as “Single-valued” except that all values flagged as bad via the quality control processes have been removed. This dataset is recommended for users that require relatively clean data. However, no adjustments have been made for heterogeneous and other biasing events. Please consider the breakpoint adjusted station data above if you wish to avoid heterogeneity.