我生成了一个如下所示的pandas数据透视表:
Encounters
Code 132 133 145
Record Number Start_date End_date Service_Date
2322 1/1/2017 1/3/2017 1/1/2017 0 1 1
1/2/2017 1 0 0
1/3/2017 0 1 1
我想根据代码
合并并汇总一些数据透视表列期望的输出:
Encounters
Code 132 133-145
Record Number Start_date End_date Service_Date
2322 1/1/2017 1/3/2017 1/1/2017 0 2
1/2/2017 1 0
1/3/2017 0 2
答案 0 :(得分:1)
数据透视表创建分层列(即多个级别)。因此,请考虑使用不同级别的元组分配来分配新的sum列:
df[('Encounters', '133-145')] = df[('Encounters', '133')] + df[('Encounters', '145')]
del df[('Encounters', '133')]
del df[('Encounters', '145')]
df.sortlevel(0, axis=1, inplace=True)
用随机数据进行演示:
数据 (带有支点的种子数据)
import numpy as np
import pandas as pd
import datetime as dt
import time
LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
epoch_time = int(time.time())
np.random.seed(555)
df = pd.DataFrame({'ID': [np.random.randint(15) for _ in range(50)],
'GROUP': ["".join(np.random.choice(LETTERS[0:3],1)) for _ in range(50)],
'NUM': np.random.uniform(50)/100,
'DATE': [dt.datetime.fromtimestamp(np.random.randint(low=1400270738,
high=epoch_time)) for _ in range(50)]})
df['YEAR'] = df['DATE'].dt.year
pvtdf = df.pivot_table(index = ['ID'], columns = ['YEAR', 'GROUP'], values = ['NUM']).fillna(0)
print(pvtdf)
# NUM
# YEAR 2014 2015 2016 2017
# GROUP A B C A B C A B C A B C
# ID
# 0 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258
# 1 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258
# 3 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.411258 0.000000 0.411258 0.000000
# 4 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000
# 5 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 6 0.000000 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 7 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 8 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 9 0.000000 0.000000 0.411258 0.411258 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000
# 10 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000
# 11 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000
# 12 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 13 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.411258 0.000000 0.411258 0.000000
# 14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000
流程 (所有2017年A,B,C列都添加到D中然后删除)
pvtdf[('NUM', 2017, 'D')] = pvtdf[('NUM', 2017, 'A')] + pvtdf[('NUM', 2017, 'B')] + pvtdf[('NUM', 2017, 'C')]
pvtdf = pvtdf.drop([('NUM', 2017, 'A'), ('NUM', 2017, 'B'), ('NUM', 2017, 'C')], axis=1)
pvtdf.sortlevel(0, axis=1, inplace=True)
print(pvtdf)
# NUM
# YEAR 2014 2015 2016 2017
# GROUP A B C A B C A B C D
# ID
# 0 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.822515
# 1 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.822515
# 3 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.411258 0.411258
# 4 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258
# 5 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 6 0.000000 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000
# 7 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 8 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 9 0.000000 0.000000 0.411258 0.411258 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000
# 10 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000
# 11 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000
# 12 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 13 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.411258 0.411258
# 14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258