Question

我正在从网站上检索csv格式的多个数据框。我将数据帧保存在一个空列表中，然后逐个读取。我无法将它们附加到单个数据框中，因为它们具有不同的列名和列顺序。所以我有以下问题：

我可以在用于读取文件的循环中创建一个具有不同名称的数据框，因此我不是将它们保存到列表中，而是为每个检索到的文件创建一个新的数据帧？如果这不可能/值得推荐有没有办法迭代我的列表来提取数据帧？目前我读了一个数据帧，但我想想出一种方法来自动化这段代码来创建像data_1，data_2等等。现在我的代码不是非常耗时，因为我只有4个数据帧，但是这可能会因更多数据而变得繁重。这是我的代码：

import pandas as pd
import urllib2
import csv

#we write the names of the files in a list so we can iterate to download the files
periods=['2012-1st-quarter','2012-2nd-quarter', '2012-3rd-quarter', '2012-4th-quarter']
general=[]
#we generate a loop to read the files from the capital bikeshare website
for i in periods:
    url = 'https://www.capitalbikeshare.com/assets/files/trip-history-data/'+i+'.csv'
    response = urllib2.urlopen(url)
    x=pd.read_csv(response)
    general.append(x)
q1=pd.DataFrame(general[0])

谢谢！

Answer 1

如果您使用字典会更好，也可以直接将网址传递给pandas.read_csv。所以简化的代码看起来像这样：

import pandas as pd

periods = ['2012-1st-quarter','2012-2nd-quarter', '2012-3rd-quarter', '2012-4th-quarter']
url = 'https://www.capitalbikeshare.com/assets/files/trip-history-data/{}.csv'
d = {period: pd.read_csv(url.format(period)) for period in periods}

然后您可以像这样访问特定的DataFrame：

 d['2012-4th-quarter']

迭代所有数据帧：

for period, df in d.items():
    print period
    print df

生成多个pandas数据帧

1 个答案: