我正在尝试基于评估两列中是否存在值来在现有DataFrame中创建新列。
假设以下是中型数据集(3000万个数据点)的一部分:
DATE |ID |3_DAY_FUTURE
2016-12-14|Bob123|2016-12-17
2016-12-15|Bob123|2016-12-18
2016-12-16|Bob123|2016-12-19
2016-12-17|Bob123|2016-12-20
2016-12-18|Bob123|2016-12-21
2016-12-19|Bob123|2016-12-22
2016-12-20|Bob123|2016-12-23
2017-01-14|Jim123|2017-01-17
2017-01-15|Jim123|2017-01-18
2017-01-16|Jim123|2017-01-19
2017-01-17|Jim123|2017-01-20
2017-01-18|Jim123|2017-01-21
2017-01-19|Jim123|2017-01-22
2017-01-20|Jim123|2017-01-23
我希望创建一个列来评估每个ID(本例中的Bob和Jim)是否具有与未来3天匹配的日期值。例如,Bob123出现在2016-12-14和2016-12-17,因为两个DATE都与他有关。第一行将添加一个新列,表示是或类似的东西。以下是我希望使用新的3_DAY_STATUS列输出的示例:
DATE |ID |3_DAY_FUTURE|3_DAY_STATUS
2016-12-14|Bob123|2016-12-17|YES
2016-12-15|Bob123|2016-12-18|YES
2016-12-16|Bob123|2016-12-19|YES
2016-12-17|Bob123|2016-12-20|YES
2016-12-18|Bob123|2016-12-21|NO
2016-12-19|Bob123|2016-12-22|No
2016-12-20|Bob123|2016-12-23|NO
2017-01-14|Jim123|2017-01-17|YES
2017-01-15|Jim123|2017-01-18|YES
2017-01-16|Jim123|2017-01-19|YES
2017-01-17|Jim123|2017-01-20|YES
2017-01-18|Jim123|2017-01-21|NO
2017-01-19|Jim123|2017-01-22|NO
2017-01-20|Jim123|2017-01-23|NO
非常感谢任何建议。
答案 0 :(得分:2)
使用groupby
ID
按isin
创建模板,然后按numpy.where
添加新值:
df.DATE = pd.to_datetime(df.DATE)
df['3_DAY_FUTURE'] = pd.to_datetime(df['3_DAY_FUTURE'])
mask = df.groupby('ID').apply(lambda x: x['3_DAY_FUTURE'].isin(df.DATE)).values
print (mask)
[ True True True True False False False True True True True False
df['3_DAY_STATUS'] = np.where(mask, 'YES', 'NO')
print (df)
DATE ID 3_DAY_FUTURE 3_DAY_STATUS
0 2016-12-14 Bob123 2016-12-17 YES
1 2016-12-15 Bob123 2016-12-18 YES
2 2016-12-16 Bob123 2016-12-19 YES
3 2016-12-17 Bob123 2016-12-20 YES
4 2016-12-18 Bob123 2016-12-21 NO
5 2016-12-19 Bob123 2016-12-22 NO
6 2016-12-20 Bob123 2016-12-23 NO
7 2017-01-14 Jim123 2017-01-17 YES
8 2017-01-15 Jim123 2017-01-18 YES
9 2017-01-16 Jim123 2017-01-19 YES
10 2017-01-17 Jim123 2017-01-20 YES
11 2017-01-18 Jim123 2017-01-21 NO
12 2017-01-19 Jim123 2017-01-22 NO
13 2017-01-20 Jim123 2017-01-23 NO
答案 1 :(得分:1)
使用shift(-3)
和np.where
df['3_DAY_STATUS'] = np.where(df.DATE.shift(-3) == df['3_DAY_FUTURE'], 'YES', 'NO')
print(df)
DATE ID 3_DAY_FUTURE 3_DAY_STATUS
0 2016-12-14 Bob123 2016-12-17 YES
1 2016-12-15 Bob123 2016-12-18 YES
2 2016-12-16 Bob123 2016-12-19 YES
3 2016-12-17 Bob123 2016-12-20 YES
4 2016-12-18 Bob123 2016-12-21 NO
5 2016-12-19 Bob123 2016-12-22 NO
6 2016-12-20 Bob123 2016-12-23 NO
7 2017-01-14 Jim123 2017-01-17 YES
8 2017-01-15 Jim123 2017-01-18 YES
9 2017-01-16 Jim123 2017-01-19 YES
10 2017-01-17 Jim123 2017-01-20 YES
11 2017-01-18 Jim123 2017-01-21 NO
12 2017-01-19 Jim123 2017-01-22 NO
13 2017-01-20 Jim123 2017-01-23 NO