Question

我有一个简单的Pandas Dataframe，其中有四列：

NRAPPOR； DSCARAT； IQTACAP;智商

2; 2006-12-31; 0.00; 60.00
2; 2007-01-31; 270.75； 150.05
2; 2007-02-28； 272.78； 148.02
2; 2007-03-31; 274.82; 145.98
2; 2007-04-30; 276.88; 143.92
... ... ... ...
5731; 2016-11-17; 1760.00; 240.00
5731; 2018-11-17; 1800.00; 200.00
5731; 2019-11-17; 1850.00; 150.00
5731; 2020-11-17; 1900.00; 100.00
5731; 2021-11-17; 1950.00; 50.00

其中：
-NRAPPOR =贷款ID
-NSCARAT =分期到期日
-IQTACAP =分期付款的主要部分
-IQTAINT =分期付款的套期部分

对于每个NRAPP，我想根据DSCARAT是否小于阈值日期（“ 2020-03-17” <）对四个不同的累加器中的IQTACAP和IQTAINT值求和。 br /> 如果DSCADRAT <=大于阈值日期（'2020-03-17'
，我想在totCapOverdue中求和IQTACAP 如果DSCADRAT大于阈值日期（'2020-03-17'
，我想在totCapToExpire中求和IQTACAP 如果DSCADRAT <=大于阈值日期（'2020-03-17'
，我想在totIntOverdue中求和IQTAINT 如果DSCADRAT>>超过阈值日期（'2020-03-17'

，我想在totIntToExpire中求和IQTAINT

我想获得一个包含5列的New DF； NRAPPOR和四个累加器

'这是我的野蛮密码：

'set threshold date
dataSoglia = '2020-03-17' 

totCapOverdue = 0
totIntOverdue = 0
totCapToExpire = 0
totIntToExpire = 0
rapportoPrev = 0

for index, row in df1.iterrows():

    'if NRAPPORT changes, I print the totalizer

    'I would prefer to obtain a new Dataframe with NRAPPOR and the four totalizer as new columns

    if((index[0]!=rapportoPrev) & (rapportoPrev!=0)):
        print(rapportoPrev,'\t', 'capOverdue: ', totCapOverdue, '\t', 'intOverdue: ', totIntOverdue, '\t','capToExpire: ', totCapToExpire,  '\t', 'intpToExpire: ', totIntToExpire)    


    'set totalizer to zero
    totCapOverdue = 0
    totIntOverdue = 0
    totCapToExpire = 0
    totIntToExpire = 0

if (index[1].strftime("%Y-%m-%d")  <= dataSoglia):
    totCapOverdue += row['IQTACAP']
    totIntOverdue += row['IQTAINT']
else:
    totCapToExpire += row['IQTACAP']
    totIntToExpire += row['IQTAINT']
rapportoPrev = index[0]
dataPrev=index[1]

这是我的输出：
2 cap过期：19999.999999999993 int过期：4887.200000000001 capToExpire：0 intpToExpire：0
3 cap过期：123156.18000000002 int过期：70519.02 capToExpire：26843.820000000003 intpToExpire：1528.9799999999996
4 capOverdue：30000.0 intOverdue：4965.180000000001 capToExpire：0 intpToExpire：0
5 cap过期：6000.000000000002 int过期：167.1 capToExpire：0 intpToExpire：0
6 capOverdue：18000.0 intOverdue：2111.89 capToExpire：0 intpToExpire：0
7 capOverdue：50000.00000000003 intOverdue：8104.3 capToExpire：0 intpToExpire：0
8 cap过期：50000.00000000003 int过期：15711.999999999996 capToExpire：0 intpToExpire：0
9 capOverdue：70000.0 intOverdue：18213.110000000004 capToExpire：0 intpToExpire：0
...

'有更好的方法吗？
谢谢

Answer 1

首先，我创建一个带有数据子集的pandas DataFrame（这是在问题中可以做的事情，以使代码可再现）：

现在，我使用熊猫import pandas as pd data = [[2, '2006-12-31', 0.0, 60.0], [2, '2007-01-31', 270.75, 150.05], [2, '2007-02-28', 272.78, 148.02], [2, '2007-03-31', 274.82, 145.98], [2, '2007-04-30', 276.88, 143.92], [5731, '2016-11-17', 1760.0, 240.0], [5731, '2018-11-17', 1800.0, 200.0], [5731, '2019-11-17', 1850.0, 150.0], [5731, '2020-11-17', 1900.0, 100.0], [5731, '2021-11-17', 1950.0, 50.0]] df = pd.DataFrame(data, columns = ['NRAPPOR', 'DSCARAT', 'IQTACAP','IQTAINT']) df = df.set_index('NRAPPOR') #--- convert string to datetime df['DSCARAT'] = df['DSCARAT'].apply(lambda ts: pd.Timestamp(ts))根据ID拆分表，并使用groupby将函数应用于每个组：查询相关日期和总和：

apply

根据datetime列上的条件，在其他新列中汇总熊猫DF值

1 个答案: