在两列上进行Groupby,然后使用pandas在特定列上应用transform(Partition),滚动和串联-Python

时间:2019-07-12 12:25:02

标签: lambda transform apply pandas-groupby string-concatenation

我有带有列(周期,spn,cpt,付款人)的DataFrame。我只能将2期的column(payer)值连接起来(仅滚动两个月)。 DF样本:

period     spn  cpt payer
7/1/2018    a   23  UNITED, HEALTH
7/1/2018    a   24  CARE, MEDI
7/1/2018    b   23  ASSIGN
8/1/2018    a   23  ASSIGN
8/1/2018    a   24  CARE, MEDI
8/1/2018    b   23  ASSIGN, MEDI
9/1/2018    a   23  ASSIGN
9/1/2018    a   24  MEDI
9/1/2018    b   23  ASSIGN, MEDI

我尝试过: df.groupby(['spn', 'cpt'])['payer'].transform(lambda x: x.rolling(2, min_periods = 1).apply(', '.join, raw=False))

我收到一个错误:无法处理此类型->对象

因此将column(payer)列转换为字符串类型,并尝试使用与上述相同的代码。但是我遇到了同样的错误。请帮我解决这个问题。

预期结果

period    spn   cpt  payer             payer_concate
7/1/2018    a   23   UNITED, HEALTH    UNITED, HEALTH, ASSIGN
7/1/2018    a   24   CARE, MEDI        CARE, MEDI, CARE, MEDI
7/1/2018    b   23   ASSIGN            ASSIGN, ASSIGN, MEDI
8/1/2018    a   23   ASSIGN            ASSIGN, ASSIGN
8/1/2018    a   24   CARE, MEDI        CARE, MEDI, MEDI
8/1/2018    b   23   ASSIGN, MEDI      ASSIGN, MEDI, ASSIGN, MEDI
9/1/2018    a   23   ASSIGN            ASSIGN
9/1/2018    a   24   MEDI              MEDI
9/1/2018    b   23   ASSIGN, MEDI      ASSIGN, MEDI

预先感谢

3 个答案:

答案 0 :(得分:2)

这应该做:

import pandas as pd
import numpy as np
data=pd.DataFrame({'period':['7/1/2018','7/1/2018','7/1/2018','8/1/2018','8/1/2018','8/1/2018','9/1/2018','9/1/2018','9/1/2018'],'span':['a','a','b','a','a','b','a','a','b'],'cpt':[23,24,23,23,24,23,23,24,23],'payer':['UNITED, HEALTH','CARE, MEDI','ASSIGN','ASSIGN','CARE, MEDI','ASSIGN, MEDI','ASSIGN','MEDI','ASSIGN, MEDI']})
Groups=[data.groupby(['span','cpt']).groups[a] for a in data.groupby(['span','cpt']).groups]
for grp in Groups:
    data.loc[grp,'payer_1']=(data.iloc[grp]['payer'].shift(-1)).values

def get_cols(row):
    if row['payer_1'] is np.nan:
        return row['payer']
    else:
        return row['payer']+", "+row['payer_1']
data['final']=data.apply(get_cols,axis=1)

数据

         period span  cpt           payer       payer_1                       final
0  7/1/2018    a   23  UNITED, HEALTH        ASSIGN      UNITED, HEALTH, ASSIGN
1  7/1/2018    a   24      CARE, MEDI    CARE, MEDI      CARE, MEDI, CARE, MEDI
2  7/1/2018    b   23          ASSIGN  ASSIGN, MEDI        ASSIGN, ASSIGN, MEDI
3  8/1/2018    a   23          ASSIGN        ASSIGN              ASSIGN, ASSIGN
4  8/1/2018    a   24      CARE, MEDI          MEDI            CARE, MEDI, MEDI
5  8/1/2018    b   23    ASSIGN, MEDI  ASSIGN, MEDI  ASSIGN, MEDI, ASSIGN, MEDI
6  9/1/2018    a   23          ASSIGN           NaN                      ASSIGN
7  9/1/2018    a   24            MEDI           NaN                        MEDI
8  9/1/2018    b   23    ASSIGN, MEDI           NaN                ASSIGN, MEDI

关于拼图的Gunnvant

答案 1 :(得分:2)

首先按句点对值进行排序。然后使用groupby并转换并连接相邻的付款人。由于上一期间没有追踪者,因此该值将为None,因此它们将由combine_first填充df中的原始值。

s = df.sort_values('period').groupby(['spn','cpt']).payer\
                       .transform(lambda x: x +',' + x.shift(-1)).combine_first(df.payer)
df["payer_concatenate"] = s

结果

    period      spn cpt payer           payer_concatenate
0   7/1/2018    a   23  UNITED,HEALTH   UNITED,HEALTH,ASSIGN
1   7/1/2018    a   24  CARE,MEDI       CARE,MEDI,CARE,MEDI
2   7/1/2018    b   23  ASSIGN          ASSIGN,ASSIGN,MEDI
3   8/1/2018    a   23  ASSIGN          ASSIGN,ASSIGN
4   8/1/2018    a   24  CARE,MEDI       CARE,MEDI,MEDI
5   8/1/2018    b   23  ASSIGN,MEDI     ASSIGN,MEDI,ASSIGN,MEDI
6   9/1/2018    a   23  ASSIGN          ASSIGN
7   9/1/2018    a   24  MEDI            MEDI
8   9/1/2018    b   23  ASSIGN,MEDI     ASSIGN,MEDI

答案 2 :(得分:0)

我可以回答您的部分问题:

data['period']=pd.to_datetime(data.period)
data['month']= data.period.dt.month
data.set_index(['spn', 'cpt','period'], inplace=True)
df = data.groupby(['spn', 'cpt']).agg(','.join)
df = data.merge(df, on=['spn','cpt'])
df.rename({'payer_x':'payer','payer_y':'payer_concate'})

输出:

spn cpt           payer  month                     payer_concate

a   23   UNITED, HEALTH      7      UNITED, HEALTH,ASSIGN,ASSIGN
    23           ASSIGN      8      UNITED, HEALTH,ASSIGN,ASSIGN
    23           ASSIGN      9      UNITED, HEALTH,ASSIGN,ASSIGN
    24       CARE, MEDI      7        CARE, MEDI,CARE, MEDI,MEDI
    24       CARE, MEDI      8        CARE, MEDI,CARE, MEDI,MEDI
    24             MEDI      9        CARE, MEDI,CARE, MEDI,MEDI
b   23           ASSIGN      7  ASSIGN,ASSIGN, MEDI,ASSIGN, MEDI
    23     ASSIGN, MEDI      8  ASSIGN,ASSIGN, MEDI,ASSIGN, MEDI
    23     ASSIGN, MEDI      9  ASSIGN,ASSIGN, MEDI,ASSIGN, MEDI