将逻辑应用于每个数据框行

时间:2019-06-27 15:11:38

标签: python pandas

我有一些逻辑需要应用于数据框的每一行,该逻辑将查看该数据框中的7个字段,然后将新列与该函数的结果写入该数据框。

我尝试在多个条件下使用df.loc来应用逻辑,它在某些情况下有效,但并非在所有情况下都有效。数据集如下所示: 输入数据:

                     CID            Status   JiraTicket    MrcNew  ...  VendorCompletion      FOC#1      FOC#2 OrderSubmitted
12                   NC1001-05      Planned   None         NaN     ...               NaT        NaT        NaT            NaT
13                   NC1001-06      Planned   None         NaN     ...               NaT        NaT        NaT            NaT
14                   301/101/0008   Active    CIOPS-18584  5200.00  ...        2019-04-15 2019-04-14        NaT     2019-03-14
15                   MO001-02       Pending   None         NaN  ...               NaT        NaT        NaT            NaT
16                   OR020-01       Pending   CIOPS-20124   8000.00  ...               NaT        NaT        NaT     2019-05-24
17                   MA004-01       Pending   CIOPS-20075   6500.00  ...               NaT 2019-12-19        NaT     2019-05-22
18                   MA004-02       Pending   CIOPS-21134   6500.00  ...               NaT 2019-12-19        NaT     2019-05-22
19                   OR004-01       Pending   CIOPS-20121  10500.00  ...               NaT        NaT        NaT     2019-05-24
20                   15001/10G      Active    CIOPS-11996   3975.00  ...        2018-08-01        NaT        NaT     2018-06-19

Month是我根据使用df.loc的逻辑添加的计算字段 其中一些数据包括“无”,“ NaT”和“ NaN”值。

mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] < YearStart), 'Month'] = '0'
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] >= YearStart), 'Month'] = mdf['BillingStartDate'].dt.month
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] != None) & (mdf['VendorCompletion'] < YearStart), 'Month'] = '0'
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] != None) & (mdf['VendorCompletion'] > YearStart), 'Month'] = mdf['VendorCompletion'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['VendorCompletion'] < YearStart), 'Month'] = '0'
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['VendorCompletion'] > YearStart), 'Month'] = mdf['VendorCompletion'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['FOC#1'] != None) & (mdf['FOC#2'] == None), 'Month'] = mdf['FOC#1'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['FOC#2'] != None), 'Month'] = mdf['FOC#2'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['FOC#1'] == None) & (mdf['FOC#2'] == None), 'Month'] = ((mdf['OrderSubmitted'].dt.month) + 4)
mdf.loc[(mdf.Status.isin(['Planned'])) & (mdf['MonthBudget']>= Today.month), 'Month'] = mdf['MonthBudget'] + 1
mdf.loc[(mdf.Status.isin(['Planned'])) & (mdf['MonthBudget']<= Today.month), 'Month'] = (Today.month + 4)
print(mdf[['CID', 'JiraTicket', 'Status', 'BillingStartDate', 'VendorCompletion', 'FOC#1', 'FOC#2', 'Month']])

示例数据输出:

        CID             JiraTicket   Status     BillingStartDate VendorCompletion      FOC#1      FOC#2 Month
12      NC1001-05               None         Planned    NaT              NaT                   NaT        NaT    11
13      NC1001-06               None         Planned    NaT              NaT                   NaT        NaT    11
14      301/101/0008            CIOPS-18584  Active     2019-04-15       2019-04-15            2019-04-14 NaT     4
15      MO001-02                None         Pending    NaT              NaT                   NaT        NaT   NaN
16      OR020-01            CIOPS-20124  Pending    NaT              NaT                   NaT        NaT   NaN
17      MA004-01            CIOPS-20075  Pending    NaT              NaT               2019-12-19 NaT   NaN
18      MA004-02            CIOPS-21134  Pending    NaT              NaT               2019-12-19 NaT   NaN
19      OR004-01            CIOPS-20121  Pending    NaT              NaT                   NaT        NaT   NaN
20      15001/10G/DNVTCO56/LTTNCOMMR17  CIOPS-11996  Active     2018-08-01       2018-08-01            NaT        NaT     0

预期结果是df.loc函数查看条件并将月值写入month字段的结果。 NaT值似乎不受我的条件语句的束缚,而且似乎比使用多个df.loc语句更有效的方法是将此逻辑应用于数据帧的每一行。

0 个答案:

没有答案