在webscraping时写一个与文件名不同的工作表名称

时间:2017-11-04 20:18:46

标签: python

我正在从网站上抓一张桌子并将其写入csv文件。该文件的名称是正确的,但我需要工作簿中的工作表名称为" Raw_Data",而不是文件名。以下是我到目前为止的情况:

import urllib.request
import json
import re
import datetime


html = urllib.request.urlopen("https://www.wunderground.com/personal-weather-station/dashboard?ID=KNYSENEC1#history/tdata/s20171104/e20171104/mdaily").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])

nnow = datetime.datetime.now().date()
Filenamee = "seneca_weather_" + str(nnow)
filename = ('%s.csv' % Filenamee)
f = open(filename, "w")

for days in data['history']['days']:
    for obs in days['observations']:
        f.write(str(obs['date']['iso8601']) + "," + str(obs['temperature']) + "," + str(obs['pressure']) + "," + str(obs['wind_dir']) + "," + str(obs['wind_speed']) + "," + str(obs['precip_today']) + "\n")

我对python和webscraping都很陌生,很抱歉这个超级广泛的问题。感谢

1 个答案:

答案 0 :(得分:0)

很高兴看到你正在尝试使用Python。一旦你使用表我推荐pandas库。在这里查看文档:{​​{3}}。

这里你有一个使用pandas和json_normalize的答案。

import urllib.request
import json
import re
import datetime
from pandas.io.json import json_normalize

html = urllib.request.urlopen("https://www.wunderground.com/personal-weather-station/dashboard?ID=KNYSENEC1#history/tdata/s20171104/e20171104/mdaily").read().decode('utf8')
json_data = re.findall(r'pws_bootstrap:(.*?)\s+,\s+country\:', html, re.S)
data = json.loads(json_data[0])

nnow = datetime.datetime.now().date()
filename = "seneca_weather_{}.xlsx".format(nnow)

df = json_normalize(data['history']['days'])

cols = ["summary.date.iso8601","summary.temperature", 
        "summary.pressure","summary.wind_dir",
        "summary.wind_speed","summary.precip_today"]

df[cols].to_excel(filename,index=False,sheet_name=filename)

输出:

http://pandas.pydata.org/pandas-docs/stable/

如果您想在* .csv中使用它,请执行以下操作:

df[cols].to_csv("output.csv",index=False)