我有一个这样的数据集:
m n o
0 2 22 42
1 3 23 43
2 4 24 44
3 5 25 45
4 6 26 46
5 7 27 47
6 8 28 48
7 9 29 49
8 10 30 50
9 11 31 51
我们如何将其转换为多索引数据框:
Index m n o
A
0 2 22 42
1 3 23 43
2 4 24 44
B
4 6 26 46
C
6 8 28 48
7 9 29 49
8 10 30 50
**我的尝试**
import numpy as np
import pandas as pd
df = pd.DataFrame({'m': np.arange(2,12),
'n': np.arange(22,32),
'o': np.arange(42,52)})
df
**分组方法**
# Required index and their names
idx = [3,5,9] # A is 0,1,2 B is 4 and C is 6,7,8
idx_orig = idx.copy()
idx_names = ['A','B','C']
# Attempt
idx_diff = np.diff(idx)
idx_diff = np.hstack((idx[0]+1,idx_diff)) # Add the first index value
idx_diff = idx_diff - 1 # Decrease index number
idx_names = np.repeat(idx_names,idx_diff)
# Drop rows with given indices
df = df.drop(df.index[idx_orig])
# Assign new col
df['Names'] = idx_names
#df.groupby('Names').count()
df
输出
m n o Names
0 2 22 42 A
1 3 23 43 A
2 4 24 44 A
4 6 26 46 B
6 8 28 48 C
7 9 29 49 C
8 10 30 50 C
在这里,我想获得一个包含所有行的多索引数据框,但是,groupby仅给出计数。
答案 0 :(得分:1)
使用:
idx = [3,5,9]
idx_names = ['A','B','C']
d = dict(enumerate(idx_names))
#get boolean mask for create cumulative sum and filter out idx rows
mask = df.index.isin(idx)
df['g'] = mask.cumsum()
#map by dictioanry
df['g'] = df['g'].map(d)
#create MultiIndex and change order of levels
df = df[~mask].set_index('g', append=True).swaplevel(0,1)
print (df)
m n o
g
A 0 2 22 42
1 3 23 43
2 4 24 44
B 4 6 26 46
C 6 8 28 48
7 9 29 49
8 10 30 50