两个不等大熊猫数据帧之间的多个if条件

时间:2017-10-02 11:24:32

标签: pandas

拥有以下DataFrames:

网站

OfflineFrom   |   OfflineTo  |   ShiftDays   |   Site  |
--------------------------------------------------------
2017-10-02        2017-10-10             6   |       ID|
2017-10-13        2017-11-10             6   |       ID|
2017-11-15        2017-12-09             6   |       ID|
2017-10-03        2017-10-11             6   |       IN|
2017-10-03        2017-10-10             6   |       IN|

节日

Holiday    |   SiteID  |
------------------------
2017-10-07 |         ID|
2017-10-08 |         ID|
2017-09-12 |         ID|
2017-10-08 |         IN|

想要得到一个逻辑,如果某个网站有假期,并且它位于OfflineFrom和OfflineTo之间,那么应该从ShiftDays中减去一天。

预期结果为:

OfflineFrom   |   OfflineTo  |   ShiftDays   |   Site  |
--------------------------------------------------------
2017-10-02        2017-10-10             4   |       ID|
2017-10-13        2017-11-10             6   |       ID|
2017-11-15        2017-12-09             6   |       ID|
2017-10-03        2017-10-11             6   |       IN|
2017-10-03        2017-10-10             5   |       IN|

感谢获取代码...感谢

用于运行此代码和测试的代码是:

# Evaluate if Holiday by Site is within OfflineFrom and OfflineTo
# Subtract the holiday from ShiftDays if it is so
import numpy as np
import pandas as pd
from datetime import datetime, time

# Prepare site ID series
s1 = pd.Series('ID', index = range(3))
s2 = pd.Series('IN', index = range(2))

site = s1.append(s2, ignore_index=True)

# Prepare OfflineFrom and OfflineTo series with datetime
offf = pd.DataFrame({'year':[2017, 2017, 2017, 2017, 2017],
                     'month': [10, 10, 10, 10, 10],
                     'day': [2, 5, 10, 20, 25]})

offt = pd.DataFrame({'year':[2017, 2017, 2017, 2017, 2017],
                     'month': [10, 10, 10, 10, 10],
                     'day': [10, 10, 18, 23, 28]})

offf = pd.to_datetime(offf)
offt = pd.to_datetime(offt)

# Make a series with ShiftDays as 6
sd = pd.Series(6, index = range(5))

# Assemble all these to a single dataframe
site = pd.DataFrame({'Site': site, 'OfflineFrom': offf, 'OfflineTo': offt, 'ShiftDays': sd})

holiday = pd.DataFrame({'SiteID': ['ID', 'ID', 'IN'], 'Holiday': [datetime.strptime('07-09-2017','%d-%m-%Y'),
                                                                  datetime.strptime('12-09-2017','%d-%m-%Y'),
                                                                  datetime.strptime('08-09-2017','%d-%m-%Y')
                                                                 ]})

test = pd.DataFrame((holiday.Holiday[:, None] >= site.OfflineFrom.values)
                                   & (holiday.Holiday[:, None] <= site.OfflineTo.values))
x = (holiday.Holiday[:, None]);x
y = site.OfflineFrom.values; y

1 个答案:

答案 0 :(得分:1)

您可以使用numpy广播:

site.ShiftDays -= ((holiday.Holiday[:, None] >= site.OfflineFrom.values)
                   & (holiday.Holiday[:, None] <= site.OfflineTo.values) 
                   & (holiday.SiteID[:, None] == site.Site.values)).sum(axis=0) 

虽然我没有测试这个效率......