我有一个简单的Pandas Dataframe,其中有四列:
NRAPPOR; DSCARAT; IQTACAP;智商
2; 2006-12-31; 0.00; 60.00
2; 2007-01-31; 270.75; 150.05
2; 2007-02-28; 272.78; 148.02
2; 2007-03-31; 274.82; 145.98
2; 2007-04-30; 276.88; 143.92
... ... ... ...
5731; 2016-11-17; 1760.00; 240.00
5731; 2018-11-17; 1800.00; 200.00
5731; 2019-11-17; 1850.00; 150.00
5731; 2020-11-17; 1900.00; 100.00
5731; 2021-11-17; 1950.00; 50.00
其中:
-NRAPPOR =贷款ID
-NSCARAT =分期到期日
-IQTACAP =分期付款的主要部分
-IQTAINT =分期付款的套期部分
对于每个NRAPP,我想根据DSCARAT是否小于阈值日期(“ 2020-03-17” <)对四个不同的累加器中的IQTACAP和IQTAINT值求和。 br />
如果DSCADRAT <=大于阈值日期('2020-03-17'
,我想在totCapOverdue中求和IQTACAP
如果DSCADRAT大于阈值日期('2020-03-17'
,我想在totCapToExpire中求和IQTACAP
如果DSCADRAT <=大于阈值日期('2020-03-17'
,我想在totIntOverdue中求和IQTAINT
如果DSCADRAT>>超过阈值日期('2020-03-17'
我想获得一个包含5列的New DF; NRAPPOR和四个累加器
'这是我的野蛮密码:
'set threshold date
dataSoglia = '2020-03-17'
totCapOverdue = 0
totIntOverdue = 0
totCapToExpire = 0
totIntToExpire = 0
rapportoPrev = 0
for index, row in df1.iterrows():
'if NRAPPORT changes, I print the totalizer
'I would prefer to obtain a new Dataframe with NRAPPOR and the four totalizer as new columns
if((index[0]!=rapportoPrev) & (rapportoPrev!=0)):
print(rapportoPrev,'\t', 'capOverdue: ', totCapOverdue, '\t', 'intOverdue: ', totIntOverdue, '\t','capToExpire: ', totCapToExpire, '\t', 'intpToExpire: ', totIntToExpire)
'set totalizer to zero
totCapOverdue = 0
totIntOverdue = 0
totCapToExpire = 0
totIntToExpire = 0
if (index[1].strftime("%Y-%m-%d") <= dataSoglia):
totCapOverdue += row['IQTACAP']
totIntOverdue += row['IQTAINT']
else:
totCapToExpire += row['IQTACAP']
totIntToExpire += row['IQTAINT']
rapportoPrev = index[0]
dataPrev=index[1]
这是我的输出:
2 cap过期:19999.999999999993 int过期:4887.200000000001 capToExpire:0 intpToExpire:0
3 cap过期:123156.18000000002 int过期:70519.02 capToExpire:26843.820000000003 intpToExpire:1528.9799999999996
4 capOverdue:30000.0 intOverdue:4965.180000000001 capToExpire:0 intpToExpire:0
5 cap过期:6000.000000000002 int过期:167.1 capToExpire:0 intpToExpire:0
6 capOverdue:18000.0 intOverdue:2111.89 capToExpire:0 intpToExpire:0
7 capOverdue:50000.00000000003 intOverdue:8104.3 capToExpire:0 intpToExpire:0
8 cap过期:50000.00000000003 int过期:15711.999999999996 capToExpire:0 intpToExpire:0
9 capOverdue:70000.0 intOverdue:18213.110000000004 capToExpire:0 intpToExpire:0
...
'有更好的方法吗?
谢谢
答案 0 :(得分:0)
首先,我创建一个带有数据子集的pandas DataFrame(这是在问题中可以做的事情,以使代码可再现):
D
现在,我使用熊猫import pandas as pd
data = [[2, '2006-12-31', 0.0, 60.0],
[2, '2007-01-31', 270.75, 150.05],
[2, '2007-02-28', 272.78, 148.02],
[2, '2007-03-31', 274.82, 145.98],
[2, '2007-04-30', 276.88, 143.92],
[5731, '2016-11-17', 1760.0, 240.0],
[5731, '2018-11-17', 1800.0, 200.0],
[5731, '2019-11-17', 1850.0, 150.0],
[5731, '2020-11-17', 1900.0, 100.0],
[5731, '2021-11-17', 1950.0, 50.0]]
df = pd.DataFrame(data, columns = ['NRAPPOR', 'DSCARAT', 'IQTACAP','IQTAINT'])
df = df.set_index('NRAPPOR')
#--- convert string to datetime
df['DSCARAT'] = df['DSCARAT'].apply(lambda ts: pd.Timestamp(ts))
根据ID拆分表,并使用groupby
将函数应用于每个组:查询相关日期和总和:
apply