串联熊猫系列列表时,出现“ ValueError:无法从重复轴重新索引”

时间:2019-07-09 16:34:26

标签: python pandas

我有一个numpy数组的列表,我试图通过以下方式将它们合并为2d矩阵:

[arr1, arr2, arr3....] 

arr1 = [0.24, 0.24, 0.56, 0.77]
arr2 = [0.1, 0.24]
arr3 = [0.6, 0.7, 0.72, 0.88]

这是输出的样子:

NaN, 0.24, 0.24, 0.56, Nan, Nan,  Nan, 0.77, Nan
0.1, 0.24,  Nan, Nan, Nan, Nan,  Nan,  Nan, Nan
Nan,  Nan,  Nan, Nan, 0.6, 0.7, 0.72,  NaN, 0.88

我使用以下脚本将它们合并:

# convert to series
series = [pd.Series(arr,index=arr) for arr in arrs]

# concat with reindex
pd.concat(series, axis=1)

但是我遇到了以下错误:

raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis

请注意,输入数组中有重复项,我想保留这些重复项。

我该如何解决?

编辑:

鉴于评论中的讨论,该错误很可能是由于重复而引起的,我希望找出解决方法。

1 个答案:

答案 0 :(得分:1)

当您重复数据时,这是一种解决方法,即根据出现的值和顺序对系列进行索引

new_arrs = []
for a in arrs:
    a = pd.Series(a)
    occurrences = a.groupby(a).cumcount()
    idx = pd.MultiIndex.from_tuples((x,y) for x,y in zip(a, occurrences ))
    a.index = idx

    new_arrs.append(a)

pd.concat(new_arrs, axis=1)

输出:

           0     1     2
0.10 0   NaN  0.10   NaN
0.24 0  0.24  0.24   NaN
     1  0.24   NaN   NaN
0.56 0  0.56   NaN   NaN
0.60 0   NaN   NaN  0.60
0.70 0   NaN   NaN  0.70
0.72 0   NaN   NaN  0.72
0.77 0  0.77   NaN   NaN
0.88 0   NaN   NaN  0.88