我已经提供了两个数据帧
DF1:
DATE INTERVALL
0 2017-01-01 00:30:00.000
1 2017-01-01 01:00:00.000
2 2017-01-01 01:30:00.000
3 2017-01-01 02:00:00.000
4 2017-01-01 02:30:00.000
....
200 2017-05-09 00:30:00.000
201 2017-05-09 01:00:00.000
202 2017-05-09 01:30:00.000
203 2017-05-09 02:00:00.000
....
DF2:
Name Date
Neujahr 2017-01-01
Karfreitag 2017-04-14
Ostersonntag 2017-04-16
Ostermontag 2017-04-17
1. Mai 2017-05-01
Christi Himmelfahrt 2017-05-25
Pfingstsonntag 2017-06-04
我想在df1中添加一个带有'HOLIDAY'的新列。 如果'dATE'列包含在df2中,则“HOLIDAY”的值应为1。
示例:
DATE INTERVALL HOLIDAY
0 2017-01-01 00:30:00.000 1
1 2017-01-01 01:00:00.000 1
2 2017-01-01 01:30:00.000 1
3 2017-01-01 02:00:00.000 1
4 2017-01-01 02:30:00.000 1
....
200 2017-05-09 00:30:00.000 0
201 2017-05-09 01:00:00.000 0
202 2017-05-09 01:30:00.000 0
203 2017-05-09 02:00:00.000 0
...
我尝试了一个if语句,但那就慢了。我认为有更好的解决方案:
if row['DATE'] == "2017-01-01":
df1.set_value(index, 'HOLIDAY', 1)
答案 0 :(得分:2)
使用isin
将boolean mask
转换为True
至1
,将False
转换为0
astype
:
#convert to datetimes if necessary
df1['DATE'] = pd.to_datetime(df1['DATE'])
df2['Date'] = pd.to_datetime(df2['Date'])
df1['HOLIDAY'] = df1['DATE'].isin(df2['Date']).astype(int)
print (df1)
DATE INTERVALL HOLIDAY
0 2017-01-01 00:30:00.000 1
1 2017-01-01 01:00:00.000 1
2 2017-01-01 01:30:00.000 1
3 2017-01-01 02:00:00.000 1
4 2017-01-01 02:30:00.000 1
200 2017-05-09 00:30:00.000 0
201 2017-05-09 01:00:00.000 0
202 2017-05-09 01:30:00.000 0
203 2017-05-09 02:00:00.000 0
详情:
print (df1['DATE'].isin(df2['Date']))
0 True
1 True
2 True
3 True
4 True
200 False
201 False
202 False
203 False
Name: DATE, dtype: bool
print (df1['DATE'].dtype)
datetime64[ns]
print (df2['Date'].dtype)
datetime64[ns]