我有一个numpy数组的列表,我试图通过以下方式将它们合并为2d矩阵:
[arr1, arr2, arr3....]
arr1 = [0.24, 0.24, 0.56, 0.77]
arr2 = [0.1, 0.24]
arr3 = [0.6, 0.7, 0.72, 0.88]
这是输出的样子:
NaN, 0.24, 0.24, 0.56, Nan, Nan, Nan, 0.77, Nan
0.1, 0.24, Nan, Nan, Nan, Nan, Nan, Nan, Nan
Nan, Nan, Nan, Nan, 0.6, 0.7, 0.72, NaN, 0.88
我使用以下脚本将它们合并:
# convert to series
series = [pd.Series(arr,index=arr) for arr in arrs]
# concat with reindex
pd.concat(series, axis=1)
但是我遇到了以下错误:
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
请注意,输入数组中有重复项,我想保留这些重复项。
我该如何解决?
编辑:
鉴于评论中的讨论,该错误很可能是由于重复而引起的,我希望找出解决方法。
答案 0 :(得分:1)
当您重复数据时,这是一种解决方法,即根据出现的值和顺序对系列进行索引
new_arrs = []
for a in arrs:
a = pd.Series(a)
occurrences = a.groupby(a).cumcount()
idx = pd.MultiIndex.from_tuples((x,y) for x,y in zip(a, occurrences ))
a.index = idx
new_arrs.append(a)
pd.concat(new_arrs, axis=1)
输出:
0 1 2
0.10 0 NaN 0.10 NaN
0.24 0 0.24 0.24 NaN
1 0.24 NaN NaN
0.56 0 0.56 NaN NaN
0.60 0 NaN NaN 0.60
0.70 0 NaN NaN 0.70
0.72 0 NaN NaN 0.72
0.77 0 0.77 NaN NaN
0.88 0 NaN NaN 0.88