重新索引多级数据框中的第二级

时间:2015-08-09 07:13:59

标签: python pandas

我需要重新索引pandas数据帧的第二级,以便第二级成为每个第一级索引的列表In [79]: df = pd.DataFrame({ 'first': ['one', 'one', 'one', 'two', 'two', 'three'], 'second': [0, 1, 2, 0, 1, 1], 'value': [1, 2, 3, 4, 5, 6] }) print df first second value 0 one 0 1 1 one 1 2 2 one 2 3 3 two 0 4 4 two 1 5 5 three 1 6 In [80]: df['second'] = df.reset_index().groupby(['first']).cumcount() print df first second value 0 one 0 1 1 one 1 2 2 one 2 3 3 two 0 4 4 two 1 5 5 three 0 6 。我试图关注this,但不幸的是,它只创建了一个与以前存在的行数一样多的索引。我想要的是,为每个新索引插入新行(使用nan值)。

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
4    two       2      nan
5  three       0      6
5  three       1      nan
5  three       2      nan

我想要的结果是:

{} &&

1 个答案:

答案 0 :(得分:3)

我认为您可以先将列firstsecond设置为多级索引,然后reindex

# your data
# ==========================
df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})

df

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6

# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])

df.set_index(['first', 'second']).reindex(multi_index).reset_index()

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5    two       2    NaN
6  three       0    NaN
7  three       1      6
8  three       2    NaN