Question

我有88个不同长度的不同dataFrame，我需要连接。它全部位于一个目录中，我使用以下python脚本生成这样一个数据框。

这是我尝试过的，

ValueError: Shape of passed values is (88, 57914), indices imply (88, 57905)

由于这些数据帧中的每一个都具有不同的长度或形状，因此会引发错误信息，

{{forecast.list.rain.3h}}

我的目标是将列逐列连接到具有88列的单个数据帧，因为我的输入是88个单独的数据帧，我需要在我的脚本中使用第7列。在这种情况下，对于连接数据帧，任何解决方案或建议都会很棒谢谢

Answer 1

关键是制作list个不同的数据帧，然后连接列表而不是单个连接。

我创建了10个df，其中填充了一列的随机长度数据，并保存到csv个文件中以模拟您的数据。

import pandas as pd
import numpy as np
from random import randint


#generate 10 df and save to seperate csv files
for i in range(1,11):
    dfi = pd.DataFrame({'a':np.arange(randint(2,11))})
    csv_file = "file{0}.csv".format(i)
    dfi.to_csv(csv_file, sep='\t')
    print "saving file", csv_file

然后我们将这10个csv个文件读入单独的数据框并保存到list

#read previously saved csv files into 10 seperate df
# and add to list
frames = []
for x in range(1,10):
    csv_file = "file{0}.csv".format(x)
    newdf = pd.DataFrame.from_csv(csv_file,  sep='\t')
    frames.append(newdf)

最后，我们连接list

#concatenate frames list
result = pd.concat(frames, axis=1)
print result

结果是10帧可变长度串联列为单df。

saving file file1.csv
saving file file2.csv
saving file file3.csv
saving file file4.csv
saving file file5.csv
saving file file6.csv
saving file file7.csv
saving file file8.csv
saving file file9.csv
saving file file10.csv
      a    a    a    a    a    a    a   a    a
0   0.0  0.0  0.0  0.0  0.0  0.0  0.0   0  0.0
1   1.0  1.0  1.0  1.0  1.0  1.0  1.0   1  1.0
2   2.0  2.0  2.0  2.0  2.0  2.0  2.0   2  2.0
3   3.0  3.0  3.0  3.0  3.0  NaN  3.0   3  NaN
4   4.0  4.0  4.0  4.0  4.0  NaN  NaN   4  NaN
5   5.0  5.0  5.0  5.0  5.0  NaN  NaN   5  NaN
6   6.0  6.0  6.0  6.0  6.0  NaN  NaN   6  NaN
7   NaN  7.0  7.0  7.0  7.0  NaN  NaN   7  NaN
8   NaN  8.0  NaN  NaN  8.0  NaN  NaN   8  NaN
9   NaN  NaN  NaN  NaN  9.0  NaN  NaN   9  NaN
10  NaN  NaN  NaN  NaN  NaN  NaN  NaN  10  NaN

希望这就是你要找的东西。可以找到关于合并，连接和连接的一个很好的例子here。

汇集不同长度的多个数据帧

1 个答案: