将网址页连接为单个数据框

时间:2015-06-24 15:01:59

标签: python pandas

我试图下载特定位置的历史天气数据。 我更改了flowingdata给出的示例,但我已经陷入了最后一步 - 如何连接多个Data Frames

MWE:

import pandas as pd

frames = pd.DataFrame(columns=['TimeEET', 'TemperatureC', 'Dew PointC', 'Humidity','Sea Level PressurehPa', 
       'VisibilityKm', 'Wind Direction', 'Wind SpeedKm/h','Gust SpeedKm/h','Precipitationmm', 
       'Events','Conditions', 'WindDirDegrees', 'DateUTC<br />'])

# Iterate through year, month, and day
for y in range(2006, 2007):
    for m in range(1, 13):
       for d in range(1, 32):

# Check if leap year
        if y%400 == 0:
            leap = True
        elif y%100 == 0:
            leap = False
        elif y%4 == 0:
            leap = True
        else:
            leap = False

#Check if already gone through month
        if (m == 2 and leap and d > 29):
            continue
        elif (m == 2 and d > 28):
            continue
        elif (m in [4, 6, 9, 10] and d > 30):
            continue

 # Open wunderground.com url
        url = "http://www.wunderground.com/history/airport/EFHK/"+str(y)+ "/" + str(m) + "/" + str(d) + "/DailyHistory.html?req_city=Vantaa&req_state=&req_statename=Finlandia&reqdb.zip=00000&reqdb.magic=4&reqdb.wmo=02974&format=1"
        df=pd.read_csv(url, sep=',',skiprows=2)
        frames=pd.concat(df)

这会出错:

 first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

所需的输出是拥有一个包含所有日期,月份和年份的数据框。

1 个答案:

答案 0 :(得分:3)

你应该在你的循环之外声明一个列表并追加到这个然后在循环之外你要将所有dfs连接成一个df:

import pandas as pd

frames = pd.DataFrame(columns=['TimeEET', 'TemperatureC', 'Dew PointC', 'Humidity','Sea Level PressurehPa', 
       'VisibilityKm', 'Wind Direction', 'Wind SpeedKm/h','Gust SpeedKm/h','Precipitationmm', 
       'Events','Conditions', 'WindDirDegrees', 'DateUTC<br />'])

# Iterate through year, month, and day
df_list = []
for y in range(2006, 2007):
    for m in range(1, 13):
       for d in range(1, 32):

# Check if leap year
        if y%400 == 0:
            leap = True
        elif y%100 == 0:
            leap = False
        elif y%4 == 0:
            leap = True
        else:
            leap = False

#Check if already gone through month
        if (m == 2 and leap and d > 29):
            continue
        elif (m == 2 and d > 28):
            continue
        elif (m in [4, 6, 9, 10] and d > 30):
            continue

 # Open wunderground.com url
        url = "http://www.wunderground.com/history/airport/EFHK/"+str(y)+ "/" + str(m) + "/" + str(d) + "/DailyHistory.html?req_city=Vantaa&req_state=&req_statename=Finlandia&reqdb.zip=00000&reqdb.magic=4&reqdb.wmo=02974&format=1"
        df=pd.read_csv(url, sep=',',skiprows=2)
        df_list.append(df)
frames=pd.concat(df_list, ignore_index=True)