我有以下数据框:
shared_from_this
我要寻找的是,如果付款是在开始日期的n年内付款的,则应在withNYears列中显示“已付款金额”,否则会显示NaN。 N年可以是任何数字,但对于本示例,我们可以说2年(因为我将使用它来查看发现)。
因此,基本上,如果在2年内付款,上述数据框就会像这样出现:
PersonID AmountPaid PaymentReceivedDate StartDate withinNYears
1 100 2017 2016
2 20 2014 2014
1 30 2017 2016
1 40 2016 2016
4 300 2015 2000
5 150 2005 2002
有人知道如何实现吗?欢呼。
答案 0 :(得分:3)
减去列并按标量比较布尔掩码,然后通过numpy.where
,Series.where
或DataFrame.loc
设置值:
m = (df['PaymentReceivedDate'] - df['StartDate']) < 2
df['withinNYears'] = np.where(m, df['AmountPaid'], np.nan)
#alternatives
#df['withinNYears'] = df['AmountPaid'].where(m)
#df.loc[m, 'withinNYears'] = df['AmountPaid']
print (df)
PersonID AmountPaid PaymentReceivedDate StartDate \
0 1 100 2017 2016
1 2 20 2014 2014
2 1 30 2017 2016
3 1 40 2016 2016
4 4 300 2015 2000
5 5 150 2005 2002
withinNYears
0 100.0
1 20.0
2 30.0
3 40.0
4 NaN
5 NaN
编辑:
如果StartDate
列中有日期时间:
m = (df['PaymentReceivedDate'] - df['StartDate'].dt. year) < 2
答案 1 :(得分:3)
只需使用loc
df.loc[(df['PaymentReceivedDate'] - df['StartDate']<2),'withinNYears']=df.AmountPaid
df
Out[37]:
PersonID AmountPaid ... StartDate withinNYears
0 1 100 ... 2016 100.0
1 2 20 ... 2014 20.0
2 1 30 ... 2016 30.0
3 1 40 ... 2016 40.0
4 4 300 ... 2000 NaN
5 5 150 ... 2002 NaN
[6 rows x 5 columns]