熊猫对多个“分组依据”进行比较,并比较不同列中的值

时间:2019-03-08 11:54:47

标签: python pandas

我有一个数据集:

In:
import pandas as pd

df = pd.DataFrame({'id': [23, 23, 23, 43, 43],
               'data_1': ['20170503', '20170503', '20170503', '20170602',                   
               '20170602'],
               'units' : [10,10,10,5,5],
               'data_2' : ['20170104', '20170503', '20170503', '20170605', 
               '20170602'],
               'code': ["s", "r", "s", "s", "r"],
               'units_2': [20,10, 10, 8, 5 ]})

print(df)

出局:

   id    data_1     units    data_2    code  units_2
0  23  20170503     10     20170104       s       20
1  23  20170503     10     20170503       r       10
2  23  20170503     10     20170503       s       10
3  43  20170602      5     20170605       s        8
4  43  20170602      5     20170602       r        5

我需要按“ id”进行分组,并检查date_2和“ s”中是否有对应于date_1的日期。可以添加一列以对那些匹配项进行打勾,因此最终输出将如下所示:

   id    data_1     units    data_2    code  units_2     new_column
0  23  20170503     10     20170104       s       20              0
1  23  20170503     10     20170503       r       10              0
2  23  20170503     10     20170503       s       10              1
3  43  20170602      5     20170605       s        8              0
4  43  20170602      5     20170602       r        5              0

谢谢您的帮助

1 个答案:

答案 0 :(得分:3)

这里groupby不是必需的,因为值不会更改或按组计数。

使用:

df['new_column']=(df.data_1.eq(df.data_2)&df.code.eq('s')).astype(int)
# or df['new_column']=(df.data_1.eq(df.data_2)&df.code.eq('s')).map({True:1,False:0})
# or df['new_column'] = np.where((df.data_1.eq(df.data_2)&df.code.eq('s')),1,0)
print(df)

   id    data_1  units    data_2 code  units_2  new_column
0  23  20170503     10  20170104    s       20           0
1  23  20170503     10  20170503    r       10           0
2  23  20170503     10  20170503    s       10           1
3  43  20170602      5  20170605    s        8           0
4  43  20170602      5  20170602    r        5           0