我有一个数据集:
In:
import pandas as pd
df = pd.DataFrame({'id': [23, 23, 23, 43, 43],
'data_1': ['20170503', '20170503', '20170503', '20170602',
'20170602'],
'units' : [10,10,10,5,5],
'data_2' : ['20170104', '20170503', '20170503', '20170605',
'20170602'],
'code': ["s", "r", "s", "s", "r"],
'units_2': [20,10, 10, 8, 5 ]})
print(df)
出局:
id data_1 units data_2 code units_2
0 23 20170503 10 20170104 s 20
1 23 20170503 10 20170503 r 10
2 23 20170503 10 20170503 s 10
3 43 20170602 5 20170605 s 8
4 43 20170602 5 20170602 r 5
我需要按“ id”进行分组,并检查date_2和“ s”中是否有对应于date_1的日期。可以添加一列以对那些匹配项进行打勾,因此最终输出将如下所示:
id data_1 units data_2 code units_2 new_column
0 23 20170503 10 20170104 s 20 0
1 23 20170503 10 20170503 r 10 0
2 23 20170503 10 20170503 s 10 1
3 43 20170602 5 20170605 s 8 0
4 43 20170602 5 20170602 r 5 0
谢谢您的帮助
答案 0 :(得分:3)
这里groupby
不是必需的,因为值不会更改或按组计数。
使用:
df['new_column']=(df.data_1.eq(df.data_2)&df.code.eq('s')).astype(int)
# or df['new_column']=(df.data_1.eq(df.data_2)&df.code.eq('s')).map({True:1,False:0})
# or df['new_column'] = np.where((df.data_1.eq(df.data_2)&df.code.eq('s')),1,0)
print(df)
id data_1 units data_2 code units_2 new_column
0 23 20170503 10 20170104 s 20 0
1 23 20170503 10 20170503 r 10 0
2 23 20170503 10 20170503 s 10 1
3 43 20170602 5 20170605 s 8 0
4 43 20170602 5 20170602 r 5 0