我有一些逻辑需要应用于数据框的每一行,该逻辑将查看该数据框中的7个字段,然后将新列与该函数的结果写入该数据框。
我尝试在多个条件下使用df.loc来应用逻辑,它在某些情况下有效,但并非在所有情况下都有效。数据集如下所示: 输入数据:
CID Status JiraTicket MrcNew ... VendorCompletion FOC#1 FOC#2 OrderSubmitted
12 NC1001-05 Planned None NaN ... NaT NaT NaT NaT
13 NC1001-06 Planned None NaN ... NaT NaT NaT NaT
14 301/101/0008 Active CIOPS-18584 5200.00 ... 2019-04-15 2019-04-14 NaT 2019-03-14
15 MO001-02 Pending None NaN ... NaT NaT NaT NaT
16 OR020-01 Pending CIOPS-20124 8000.00 ... NaT NaT NaT 2019-05-24
17 MA004-01 Pending CIOPS-20075 6500.00 ... NaT 2019-12-19 NaT 2019-05-22
18 MA004-02 Pending CIOPS-21134 6500.00 ... NaT 2019-12-19 NaT 2019-05-22
19 OR004-01 Pending CIOPS-20121 10500.00 ... NaT NaT NaT 2019-05-24
20 15001/10G Active CIOPS-11996 3975.00 ... 2018-08-01 NaT NaT 2018-06-19
Month是我根据使用df.loc的逻辑添加的计算字段 其中一些数据包括“无”,“ NaT”和“ NaN”值。
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] < YearStart), 'Month'] = '0'
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] >= YearStart), 'Month'] = mdf['BillingStartDate'].dt.month
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] != None) & (mdf['VendorCompletion'] < YearStart), 'Month'] = '0'
mdf.loc[(mdf.Status.isin(['Active'])) & (mdf['BillingStartDate'] != None) & (mdf['VendorCompletion'] > YearStart), 'Month'] = mdf['VendorCompletion'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['VendorCompletion'] < YearStart), 'Month'] = '0'
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['VendorCompletion'] > YearStart), 'Month'] = mdf['VendorCompletion'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['FOC#1'] != None) & (mdf['FOC#2'] == None), 'Month'] = mdf['FOC#1'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['FOC#2'] != None), 'Month'] = mdf['FOC#2'].dt.month
mdf.loc[(mdf.Status.isin(['Pending'])) & (mdf['FOC#1'] == None) & (mdf['FOC#2'] == None), 'Month'] = ((mdf['OrderSubmitted'].dt.month) + 4)
mdf.loc[(mdf.Status.isin(['Planned'])) & (mdf['MonthBudget']>= Today.month), 'Month'] = mdf['MonthBudget'] + 1
mdf.loc[(mdf.Status.isin(['Planned'])) & (mdf['MonthBudget']<= Today.month), 'Month'] = (Today.month + 4)
print(mdf[['CID', 'JiraTicket', 'Status', 'BillingStartDate', 'VendorCompletion', 'FOC#1', 'FOC#2', 'Month']])
示例数据输出:
CID JiraTicket Status BillingStartDate VendorCompletion FOC#1 FOC#2 Month
12 NC1001-05 None Planned NaT NaT NaT NaT 11
13 NC1001-06 None Planned NaT NaT NaT NaT 11
14 301/101/0008 CIOPS-18584 Active 2019-04-15 2019-04-15 2019-04-14 NaT 4
15 MO001-02 None Pending NaT NaT NaT NaT NaN
16 OR020-01 CIOPS-20124 Pending NaT NaT NaT NaT NaN
17 MA004-01 CIOPS-20075 Pending NaT NaT 2019-12-19 NaT NaN
18 MA004-02 CIOPS-21134 Pending NaT NaT 2019-12-19 NaT NaN
19 OR004-01 CIOPS-20121 Pending NaT NaT NaT NaT NaN
20 15001/10G/DNVTCO56/LTTNCOMMR17 CIOPS-11996 Active 2018-08-01 2018-08-01 NaT NaT 0
预期结果是df.loc函数查看条件并将月值写入month字段的结果。 NaT值似乎不受我的条件语句的束缚,而且似乎比使用多个df.loc语句更有效的方法是将此逻辑应用于数据帧的每一行。