我正在从网站上抓一张桌子并将其写入csv文件。该文件的名称是正确的,但我需要工作簿中的工作表名称为" Raw_Data",而不是文件名。以下是我到目前为止的情况:
import urllib.request
import json
import re
import datetime
html = urllib.request.urlopen("https://www.wunderground.com/personal-weather-station/dashboard?ID=KNYSENEC1#history/tdata/s20171104/e20171104/mdaily").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])
nnow = datetime.datetime.now().date()
Filenamee = "seneca_weather_" + str(nnow)
filename = ('%s.csv' % Filenamee)
f = open(filename, "w")
for days in data['history']['days']:
for obs in days['observations']:
f.write(str(obs['date']['iso8601']) + "," + str(obs['temperature']) + "," + str(obs['pressure']) + "," + str(obs['wind_dir']) + "," + str(obs['wind_speed']) + "," + str(obs['precip_today']) + "\n")
我对python和webscraping都很陌生,很抱歉这个超级广泛的问题。感谢
答案 0 :(得分:0)
很高兴看到你正在尝试使用Python。一旦你使用表我推荐pandas库。在这里查看文档:{{3}}。
这里你有一个使用pandas和json_normalize的答案。
import urllib.request
import json
import re
import datetime
from pandas.io.json import json_normalize
html = urllib.request.urlopen("https://www.wunderground.com/personal-weather-station/dashboard?ID=KNYSENEC1#history/tdata/s20171104/e20171104/mdaily").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])
nnow = datetime.datetime.now().date()
filename = "seneca_weather_{}.xlsx".format(nnow)
df = json_normalize(data['history']['days'])
cols = ["summary.date.iso8601","summary.temperature",
"summary.pressure","summary.wind_dir",
"summary.wind_speed","summary.precip_today"]
df[cols].to_excel(filename,index=False,sheet_name=filename)
输出:
http://pandas.pydata.org/pandas-docs/stable/
如果您想在* .csv中使用它,请执行以下操作:
df[cols].to_csv("output.csv",index=False)