检查日期是否在另一个数据框中并设置新列

时间:2017-10-16 12:36:16

标签: python pandas dataframe

我已经提供了两个数据帧

DF1:

    DATE     INTERVALL    
0  2017-01-01  00:30:00.000      
1  2017-01-01  01:00:00.000  
2  2017-01-01  01:30:00.000   
3  2017-01-01  02:00:00.000      
4  2017-01-01  02:30:00.000
....
200 2017-05-09 00:30:00.000
201 2017-05-09 01:00:00.000  
202 2017-05-09 01:30:00.000  
203 2017-05-09 02:00:00.000
....        

DF2:

Name                         Date                       
Neujahr                     2017-01-01
Karfreitag                  2017-04-14
Ostersonntag                2017-04-16
Ostermontag                 2017-04-17
1. Mai                      2017-05-01
Christi Himmelfahrt         2017-05-25
Pfingstsonntag              2017-06-04

我想在df1中添加一个带有'HOLIDAY'的新列。 如果'dATE'列包含在df2中,则“HOLIDAY”的值应为1。

示例:

    DATE     INTERVALL        HOLIDAY    
0  2017-01-01  00:30:00.000     1      
1  2017-01-01  01:00:00.000     1
2  2017-01-01  01:30:00.000     1
3  2017-01-01  02:00:00.000     1 
4  2017-01-01  02:30:00.000     1
....
200 2017-05-09 00:30:00.000     0 
201 2017-05-09 01:00:00.000     0
202 2017-05-09 01:30:00.000     0
203 2017-05-09 02:00:00.000     0 
... 

我尝试了一个if语句,但那就慢了。我认为有更好的解决方案:

    if row['DATE'] == "2017-01-01":
        df1.set_value(index, 'HOLIDAY', 1)

1 个答案:

答案 0 :(得分:2)

使用isinboolean mask转换为True1,将False转换为0 astype

#convert to datetimes if necessary
df1['DATE'] = pd.to_datetime(df1['DATE'])
df2['Date'] = pd.to_datetime(df2['Date'])

df1['HOLIDAY'] = df1['DATE'].isin(df2['Date']).astype(int)
print (df1)
          DATE     INTERVALL  HOLIDAY
0   2017-01-01  00:30:00.000        1
1   2017-01-01  01:00:00.000        1
2   2017-01-01  01:30:00.000        1
3   2017-01-01  02:00:00.000        1
4   2017-01-01  02:30:00.000        1
200 2017-05-09  00:30:00.000        0
201 2017-05-09  01:00:00.000        0
202 2017-05-09  01:30:00.000        0
203 2017-05-09  02:00:00.000        0

详情:

print (df1['DATE'].isin(df2['Date']))
0       True
1       True
2       True
3       True
4       True
200    False
201    False
202    False
203    False
Name: DATE, dtype: bool

print (df1['DATE'].dtype)
datetime64[ns]

print (df2['Date'].dtype)
datetime64[ns]