根据特定条件填充新列,并按班次进行操作和分组依据

时间:2019-12-02 02:43:14

标签: python pandas

我有一个看起来像

的数据框
LastName Date ObjectCol1 ObjectCol2 NumCol1 NumCol2 CurrentState ExpectedState
ABC      March                                            A1              A2     
ABC      June                                             A1              A2
XYZ      March                                            A2              A2
XYZ      June                                             A2              A2
XYZ      July                                             A2              A2
AAA      March                                            D3              D2
AAA      June                                             D2              D1  
DEF      March                                            C1              C1
DEF      June                                             C2              C3
DEF      July                                             C3              C3

我想创建一个新列,以便为姓(并且,如果Date值不是该姓的最大日期),那么如果Intermediate 2 == Intermediate 1(对于下一个后续日期值)以该姓氏为准),则新列的值应为..say“ Hit”,否则为“ Miss” 如果日期值为最大日期,则列值为“尚待观察”

所以结果看起来像

LastName Date ObjectCol1 ObjectCol2 NumCol1 NumCol2 CurrentState ExpectedState   Result
ABC      March                                            A1              A2      Miss (because A2 here != Intermediate 1 value in the next row)
ABC      June                                             A1              A2      Yet to be seen
XYZ      March                                            A2              A2      Hit
XYZ      June                                             A2              A2      Hit
XYZ      July                                             A2              A2      Yet to be seen
AAA      March                                            D3              D2      Hit
AAA      June                                             D2              D1      Yet to be seen
DEF      March                                            C1              C1      Miss
DEF      June                                             C2              C3      Hit
DEF      July                                             C3              C3      Yet to be seen

1 个答案:

答案 0 :(得分:0)

df['Pre-result'] = df.groupby(['LastName'])['CurrentState '].shift(-1)

df['Result'] = np.where(df['Pre-result'] == df['ExpectedState'], "Hit", "Miss")

df['Result'] = np.where(df['Pre-result'].isna(), "Yet to be seen", df['Result'])

del df['Pre-result']


LastName Date ObjectCol1 ObjectCol2 NumCol1 NumCol2 CurrentState ExpectedState   Result
ABC      March                                            A1              A2      Miss (because A2 here != Intermediate 1 value in the next row)
ABC      June                                             A1              A2      Yet to be seen
XYZ      March                                            A2              A2      Hit
XYZ      June                                             A2              A2      Hit
XYZ      July                                             A2              A2      Yet to be seen
AAA      March                                            D3              D2      Hit
AAA      June                                             D2              D1      Yet to be seen
DEF      March                                            C1              C1      Miss
DEF      June                                             C2              C3      Hit
DEF      July                                             C3              C3      Yet to be seen