我有带有列(周期,spn,cpt,付款人)的DataFrame。我只能将2期的column(payer)值连接起来(仅滚动两个月)。 DF样本:
period spn cpt payer
7/1/2018 a 23 UNITED, HEALTH
7/1/2018 a 24 CARE, MEDI
7/1/2018 b 23 ASSIGN
8/1/2018 a 23 ASSIGN
8/1/2018 a 24 CARE, MEDI
8/1/2018 b 23 ASSIGN, MEDI
9/1/2018 a 23 ASSIGN
9/1/2018 a 24 MEDI
9/1/2018 b 23 ASSIGN, MEDI
我尝试过:
df.groupby(['spn', 'cpt'])['payer'].transform(lambda x: x.rolling(2, min_periods = 1).apply(', '.join, raw=False))
我收到一个错误:无法处理此类型->对象。
因此将column(payer)列转换为字符串类型,并尝试使用与上述相同的代码。但是我遇到了同样的错误。请帮我解决这个问题。
预期结果
period spn cpt payer payer_concate
7/1/2018 a 23 UNITED, HEALTH UNITED, HEALTH, ASSIGN
7/1/2018 a 24 CARE, MEDI CARE, MEDI, CARE, MEDI
7/1/2018 b 23 ASSIGN ASSIGN, ASSIGN, MEDI
8/1/2018 a 23 ASSIGN ASSIGN, ASSIGN
8/1/2018 a 24 CARE, MEDI CARE, MEDI, MEDI
8/1/2018 b 23 ASSIGN, MEDI ASSIGN, MEDI, ASSIGN, MEDI
9/1/2018 a 23 ASSIGN ASSIGN
9/1/2018 a 24 MEDI MEDI
9/1/2018 b 23 ASSIGN, MEDI ASSIGN, MEDI
预先感谢
答案 0 :(得分:2)
这应该做:
import pandas as pd
import numpy as np
data=pd.DataFrame({'period':['7/1/2018','7/1/2018','7/1/2018','8/1/2018','8/1/2018','8/1/2018','9/1/2018','9/1/2018','9/1/2018'],'span':['a','a','b','a','a','b','a','a','b'],'cpt':[23,24,23,23,24,23,23,24,23],'payer':['UNITED, HEALTH','CARE, MEDI','ASSIGN','ASSIGN','CARE, MEDI','ASSIGN, MEDI','ASSIGN','MEDI','ASSIGN, MEDI']})
Groups=[data.groupby(['span','cpt']).groups[a] for a in data.groupby(['span','cpt']).groups]
for grp in Groups:
data.loc[grp,'payer_1']=(data.iloc[grp]['payer'].shift(-1)).values
def get_cols(row):
if row['payer_1'] is np.nan:
return row['payer']
else:
return row['payer']+", "+row['payer_1']
data['final']=data.apply(get_cols,axis=1)
数据
period span cpt payer payer_1 final
0 7/1/2018 a 23 UNITED, HEALTH ASSIGN UNITED, HEALTH, ASSIGN
1 7/1/2018 a 24 CARE, MEDI CARE, MEDI CARE, MEDI, CARE, MEDI
2 7/1/2018 b 23 ASSIGN ASSIGN, MEDI ASSIGN, ASSIGN, MEDI
3 8/1/2018 a 23 ASSIGN ASSIGN ASSIGN, ASSIGN
4 8/1/2018 a 24 CARE, MEDI MEDI CARE, MEDI, MEDI
5 8/1/2018 b 23 ASSIGN, MEDI ASSIGN, MEDI ASSIGN, MEDI, ASSIGN, MEDI
6 9/1/2018 a 23 ASSIGN NaN ASSIGN
7 9/1/2018 a 24 MEDI NaN MEDI
8 9/1/2018 b 23 ASSIGN, MEDI NaN ASSIGN, MEDI
关于拼图的Gunnvant
答案 1 :(得分:2)
首先按句点对值进行排序。然后使用groupby并转换并连接相邻的付款人。由于上一期间没有追踪者,因此该值将为None
,因此它们将由combine_first
填充df
中的原始值。
s = df.sort_values('period').groupby(['spn','cpt']).payer\
.transform(lambda x: x +',' + x.shift(-1)).combine_first(df.payer)
df["payer_concatenate"] = s
结果
period spn cpt payer payer_concatenate
0 7/1/2018 a 23 UNITED,HEALTH UNITED,HEALTH,ASSIGN
1 7/1/2018 a 24 CARE,MEDI CARE,MEDI,CARE,MEDI
2 7/1/2018 b 23 ASSIGN ASSIGN,ASSIGN,MEDI
3 8/1/2018 a 23 ASSIGN ASSIGN,ASSIGN
4 8/1/2018 a 24 CARE,MEDI CARE,MEDI,MEDI
5 8/1/2018 b 23 ASSIGN,MEDI ASSIGN,MEDI,ASSIGN,MEDI
6 9/1/2018 a 23 ASSIGN ASSIGN
7 9/1/2018 a 24 MEDI MEDI
8 9/1/2018 b 23 ASSIGN,MEDI ASSIGN,MEDI
答案 2 :(得分:0)
我可以回答您的部分问题:
data['period']=pd.to_datetime(data.period)
data['month']= data.period.dt.month
data.set_index(['spn', 'cpt','period'], inplace=True)
df = data.groupby(['spn', 'cpt']).agg(','.join)
df = data.merge(df, on=['spn','cpt'])
df.rename({'payer_x':'payer','payer_y':'payer_concate'})
输出:
spn cpt payer month payer_concate
a 23 UNITED, HEALTH 7 UNITED, HEALTH,ASSIGN,ASSIGN
23 ASSIGN 8 UNITED, HEALTH,ASSIGN,ASSIGN
23 ASSIGN 9 UNITED, HEALTH,ASSIGN,ASSIGN
24 CARE, MEDI 7 CARE, MEDI,CARE, MEDI,MEDI
24 CARE, MEDI 8 CARE, MEDI,CARE, MEDI,MEDI
24 MEDI 9 CARE, MEDI,CARE, MEDI,MEDI
b 23 ASSIGN 7 ASSIGN,ASSIGN, MEDI,ASSIGN, MEDI
23 ASSIGN, MEDI 8 ASSIGN,ASSIGN, MEDI,ASSIGN, MEDI
23 ASSIGN, MEDI 9 ASSIGN,ASSIGN, MEDI,ASSIGN, MEDI