Question

我有一个看起来像这样的数据框：

   y
0, 0.3234
0, 0.5234
1, 0.3234
1, 0.7854
1, 0.1863
2, 0.0021

如您所见，索引重复。当我按索引访问时，它将返回该索引的所有值。因此，我决定根据索引对它们进行拆分，例如y1的第一个值为0的索引，第一个值为1的索引，第一个值为2的索引等。类似地，y2的第二个值为0的索引，第二个值为1的索引，第二个值具有2个索引等。由于索引出现的次数不是恒定的，我要新的数据帧y1，y2，y3，y4等，直到max（y.index中索引发生索引的次数）并填写其余数据NaN的值。像分离的数据帧一样：

y1：

y2：

   y2
0, 0.5234
1, 0.7854

y3：

   y3
1, 0.1863

我曾尝试使用y.iterrows（）中的[i [0] for i]使用索引和列表理解进行访问，但是它也不起作用。有帮助吗？

Answer 1

IIUC，您需要cumcount +字符串连接和一个groupby。我们可以将结果返回到数据帧的字典。

dfs = { k : v.drop('key',1) for k,v in 
   df.assign(key='y' + 
               (df.groupby(level=0).cumcount() + 1).astype(str))\
               .groupby('key')}

print(dfs['y1'])

        y
0  0.3234
1  0.3234
2  0.0021

print(dfs['y2'])

        y
0  0.5234
1  0.7854


print(dfs['y3'])
        y
1  0.1863

Answer 2

另一种方式

df['z']=df.groupby(df.index).cumcount()
y=df[df.z==0]#Select group index 0
y1=df[df.z==1]#Select group index 1
y2=df[df.z==2]#Select group index 2

  print(y.iloc[:,:1:])#Slice out z column used to sort
         y
0,  0.3234
1,  0.3234
2,  0.0021



print(y1.iloc[:,:1:])#Slice out z column used to sort
         y
0,  0.5234
1,  0.7854

print(y2.iloc[:,:1:])#Slice out z column used to sort
      y
1,  0.1863

如何从具有重复索引的数据帧中分离每个索引？

2 个答案: