我有88个不同长度的不同dataFrame,我需要连接。它全部位于一个目录中,我使用以下python脚本生成这样一个数据框。
这是我尝试过的,
ValueError: Shape of passed values is (88, 57914), indices imply (88, 57905)
由于这些数据帧中的每一个都具有不同的长度或形状,因此会引发错误信息,
{{forecast.list.rain.3h}}
我的目标是将列逐列连接到具有88列的单个数据帧,因为我的输入是88个单独的数据帧,我需要在我的脚本中使用第7列。 在这种情况下,对于连接数据帧,任何解决方案或建议都会很棒 谢谢
答案 0 :(得分:2)
关键是制作list
个不同的数据帧,然后连接列表而不是单个连接。
我创建了10个df
,其中填充了一列的随机长度数据,并保存到csv
个文件中以模拟您的数据。
import pandas as pd
import numpy as np
from random import randint
#generate 10 df and save to seperate csv files
for i in range(1,11):
dfi = pd.DataFrame({'a':np.arange(randint(2,11))})
csv_file = "file{0}.csv".format(i)
dfi.to_csv(csv_file, sep='\t')
print "saving file", csv_file
然后我们将这10个csv
个文件读入单独的数据框并保存到list
#read previously saved csv files into 10 seperate df
# and add to list
frames = []
for x in range(1,10):
csv_file = "file{0}.csv".format(x)
newdf = pd.DataFrame.from_csv(csv_file, sep='\t')
frames.append(newdf)
最后,我们连接list
#concatenate frames list
result = pd.concat(frames, axis=1)
print result
结果是10帧可变长度串联列为单df
。
saving file file1.csv
saving file file2.csv
saving file file3.csv
saving file file4.csv
saving file file5.csv
saving file file6.csv
saving file file7.csv
saving file file8.csv
saving file file9.csv
saving file file10.csv
a a a a a a a a a
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0
1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1 1.0
2 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2 2.0
3 3.0 3.0 3.0 3.0 3.0 NaN 3.0 3 NaN
4 4.0 4.0 4.0 4.0 4.0 NaN NaN 4 NaN
5 5.0 5.0 5.0 5.0 5.0 NaN NaN 5 NaN
6 6.0 6.0 6.0 6.0 6.0 NaN NaN 6 NaN
7 NaN 7.0 7.0 7.0 7.0 NaN NaN 7 NaN
8 NaN 8.0 NaN NaN 8.0 NaN NaN 8 NaN
9 NaN NaN NaN NaN 9.0 NaN NaN 9 NaN
10 NaN NaN NaN NaN NaN NaN NaN 10 NaN
希望这就是你要找的东西。可以找到关于合并,连接和连接的一个很好的例子here。