如何使用BigQuery获取任何城市的历史天气?

时间:2016-01-15 04:57:58

标签: sql google-bigquery weather opendata

BigQuery将NOAA的gsod数据作为公共数据集加载 - 从1929年开始:https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/

如何检索任何城市的历史数据?

4 个答案:

答案 0 :(得分:9)

2017年更新:标准SQL和最新表:

SELECT TIMESTAMP(CONCAT(year,'-',mo,'-',da)) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM `bigquery-public-data.noaa_gsod.gsod2016`
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

另外一个例子,展示这十年芝加哥最寒冷的日子:

#standardSQL
SELECT year, FORMAT('%s%s',mo,da) day ,min
FROM `fh-bigquery.weather_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod201*` b
ON a.usaf=b.stn AND a.wban=b.wban
WHERE name='CHICAGO/O HARE ARPT'
AND min!=9999.9
AND mo<'03'
ORDER BY 1,2

要检索任何城市的历史天气,首先我们需要找到该城市的哪些电台报告。表[fh-bigquery:weather_gsod.stations]包含已知电台的名称,其状态(如果在美国),国家/地区和其他详细信息。

所以为了找到德克萨斯州奥斯汀的所有电台,我们会使用这样的查询:

SELECT state, name, lat, lon
FROM [fh-bigquery:weather_gsod.stations] 
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
LIMIT 10

enter image description here

这种方法有两个问题需要解决:

  • 并非该表中存在每个已知的工作站 - 我需要获取此文件的更新版本。所以,如果你找不到你想要的电台,请不要放弃。
  • 并非此文件中找到的每个电台每年都在运行 - 所以我们需要在我们寻找的那一年找到有数据的电台。

要解决第二个问题,我们需要将站表与我们正在寻找的实际数据相连接。以下查询查找奥斯汀周围的电台,c列查看2015年期间有多少天有实际数据:

SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long
FROM [fh-bigquery:weather_gsod.gsod2015] a
JOIN [fh-bigquery:weather_gsod.stations] b 
ON a.wban=b.wban
AND a.stn=b.usaf
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
GROUP BY 1,2
LIMIT 10

enter image description here

那很好!我们在2015年找到了4个包含Austin数据的电台。

请注意,我们必须以特殊的方式对待“下雨”:当一个电台不监测下雨而不是null时,它会将其标记为99.99。我们的查询会过滤掉这些值。

现在我们知道了这些站点的stn和wban数字,我们可以选择其中任何一个并可视化结果:

SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM [fh-bigquery:weather_gsod.gsod2015]
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

enter image description here

答案 1 :(得分:1)

感谢您提取数据并将其作为公共表格。这是一个BigQuery,它返回了2014年德克萨斯州每个站点的总降雨量:

SELECT FIRST(name) AS station_name, stn, SUM(prcp) AS annual_precip
FROM [fh-bigquery:weather_gsod.gsod2014] gsod
JOIN [fh-bigquery:weather_gsod.stations] stations
ON gsod.wban=stations.wban AND gsod.stn=stations.usaf
WHERE state='TX' AND prcp != 99.99
GROUP BY stn

返回: table of results

拉入每个位置的下雨天数,并根据以下内容对结果进行排序:

SELECT FIRST(name) AS station_name, stn, SUM(prcp) AS annual_precip,     COUNT(prcp) AS rainy_days
FROM [fh-bigquery:weather_gsod.gsod2014] gsod
JOIN [fh-bigquery:weather_gsod.stations] stations
ON gsod.wban=stations.wban AND gsod.stn=stations.usaf
WHERE state='TX' AND prcp != 99.99 AND prcp > 0
GROUP BY stn
ORDER BY rainy_days DESC

出现these results

答案 2 :(得分:1)

除了official set of the NOAA data on BigQuery之外,现在还有一个Felipe's "official" public dataset。有a blog post describing it

获取2016年8月15日最低温度的示例:

SELECT
  name, 
  value/10 AS min_temperature,
  latitude,
  longitude
FROM
  [bigquery-public-data:ghcn_d.ghcnd_stations] AS stn
JOIN
  [bigquery-public-data:ghcn_d.ghcnd_2016] AS wx
ON
  wx.id = stn.id
WHERE
  wx.element = 'TMIN'
  AND wx.qflag IS NULL
  AND STRING(wx.date) = '2016-08-15'

返回:

enter image description here

答案 3 :(得分:0)

使用站名是不可靠的。另外,很难使用具有新的bigquery功能的地理空间查询,因为城市的边界没有清晰的形状(例如圆形或矩形)。

因此,对于您的问题,我发现的最佳解决方案是使用反向地理编码,要求Google Maps API使用纬度/经度坐标为每个站点生成地址,州,城市和县。

以下是在美国产生的CSV(StationNumber,Lat,Lon,Address,State,City,County,Zip)(您会注意到那里存在98%的电台): https://gist.github.com/orcaman/a3e23c47489705dff93aace2e35f57d3

以下是您要在美国以外的地方(golang)上重新运行它的代码: https://gist.github.com/orcaman/8de55f14f1c70ef5b0c124cf2fb7d9d1