高级熊猫:从某些行索引创建多索引熊猫数据框

时间:2018-08-31 12:43:02

标签: python pandas multi-index

我有一个这样的数据集:

    m   n   o  
0   2   22  42  
1   3   23  43  
2   4   24  44  
3   5   25  45  
4   6   26  46  
5   7   27  47  
6   8   28  48  
7   9   29  49  
8   10  30  50  
9   11  31  51  

我们如何将其转换为多索引数据框:

Index   m    n  o  
  A  
     0  2   22  42  
     1  3   23  43  
     2  4   24  44  
  B  
    4   6   26  46  
  C  
    6   8   28  48  
    7   9   29  49  
    8   10  30  50 

**我的尝试**

import numpy as np 
import pandas as pd

df = pd.DataFrame({'m': np.arange(2,12),
                   'n': np.arange(22,32),
                  'o': np.arange(42,52)})

df

**分组方法**

# Required index and their names
idx = [3,5,9]  # A is 0,1,2 B is 4 and C is 6,7,8
idx_orig = idx.copy()
idx_names = ['A','B','C']

# Attempt
idx_diff = np.diff(idx)
idx_diff = np.hstack((idx[0]+1,idx_diff)) # Add the first index value
idx_diff = idx_diff - 1 # Decrease index number
idx_names = np.repeat(idx_names,idx_diff)

# Drop rows with given indices
df = df.drop(df.index[idx_orig])

# Assign new col
df['Names'] = idx_names
#df.groupby('Names').count()
df

输出

    m   n   o   Names
0   2   22  42  A
1   3   23  43  A
2   4   24  44  A
4   6   26  46  B
6   8   28  48  C
7   9   29  49  C
8   10  30  50  C

在这里,我想获得一个包含所有行的多索引数据框,但是,groupby仅给出计数。

1 个答案:

答案 0 :(得分:1)

使用:

idx = [3,5,9] 

idx_names = ['A','B','C']
d = dict(enumerate(idx_names))

#get boolean mask for create cumulative sum and filter out idx rows
mask = df.index.isin(idx)
df['g'] = mask.cumsum()
#map by dictioanry
df['g'] = df['g'].map(d)
#create MultiIndex and change order of levels
df = df[~mask].set_index('g', append=True).swaplevel(0,1)
print (df)
      m   n   o
g              
A 0   2  22  42
  1   3  23  43
  2   4  24  44
B 4   6  26  46
C 6   8  28  48
  7   9  29  49
  8  10  30  50