我想将天气数据加载到BigQuery中。我期待将天气模式与我自己的数据集相关联。
答案 0 :(得分:1)
我有这个脚本可以将NOAAs全球每日gsod数据下载到BigQuery中:
#!/bin/bash
year=$1
# Folder for each year.
mkdir -p $year
# Get yearly data from NOAA.
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/$year/gsod_$year.tar -O $year/gsod_$year.tar
# Untar one file per day.
tar -xvf $year/gsod_$year.tar -C $year/
# Archive not needed anymore.
rm $year/gsod_$year.tar
# Unzip each file.
find $year -name "*.gz" -print0 | xargs -0 gunzip
# Merge all files.
find $year -name "*.op" -print0 | xargs -0 grep -h -v STN > $year.op
# Transform NOAA's format to csv.
# in2csv from https://csvkit.readthedocs.org/en/0.9.0/
# gsod_schema.csv from https://github.com/tothebeat/noaa-gsod-data-munging/
in2csv -s gsod_schema.csv $year.op > $year.csv
# Load into BigQuery.
bq load --max_bad_records 10 --replace weather_gsod.gsod$year $year.csv stn,wban,year,mo,da,temp:float,count_temp:integer,dewp:float,count_dewp:integer,slp:float,count_slp:integer,stp:float,count_stp:integer,visib:float,count_visib:integer,wdsp,count_wdsp,mxpsd,gust:float,max:float,flag_max,min:float,flag_min,prcp:float,flag_prcp,sndp:float,fog,rain_drizzle,snow_ice_pellets,hail,thunder,tornado_funnel_cloud
下载年度NOAA档案,解压缩,解压缩每个文件,然后将特殊的NOAA编码转换为BigQuery可读的CSV。