根据datetime列上的条件,在其他新列中汇总熊猫DF值

时间:2020-04-02 21:02:33

标签: python pandas

我有一个简单的Pandas Dataframe,其中有四列:

NRAPPOR; DSCARAT; IQTACAP;智商

2; 2006-12-31; 0.00; 60.00
2; 2007-01-31; 270.75; 150.05
2; 2007-02-28; 272.78; 148.02
2; 2007-03-31; 274.82; 145.98
2; 2007-04-30; 276.88; 143.92
... ... ... ...
5731; 2016-11-17; 1760.00; 240.00
5731; 2018-11-17; 1800.00; 200.00
5731; 2019-11-17; 1850.00; 150.00
5731; 2020-11-17; 1900.00; 100.00
5731; 2021-11-17; 1950.00; 50.00

其中:
-NRAPPOR =贷款ID
-NSCARAT =分期到期日
-IQTACAP =分期付款的主要部分
-IQTAINT =分期付款的套期部分

对于每个NRAPP,我想根据DSCARAT是否小于阈值日期(“ 2020-03-17” <)对四个不同的累加器中的IQTACAP和IQTAINT值求和。 br /> 如果DSCADRAT <=大于阈值日期('2020-03-17'
,我想在totCapOverdue中求和IQTACAP 如果DSCADRAT大于阈值日期('2020-03-17'
,我想在totCapToExpire中求和IQTACAP 如果DSCADRAT <=大于阈值日期('2020-03-17'
,我想在totIntOverdue中求和IQTAINT 如果DSCADRAT>>超过阈值日期('2020-03-17'

,我想在totIntToExpire中求和IQTAINT

我想获得一个包含5列的New DF; NRAPPOR和四个累加器

'这是我的野蛮密码:

'set threshold date
dataSoglia = '2020-03-17' 

totCapOverdue = 0
totIntOverdue = 0
totCapToExpire = 0
totIntToExpire = 0
rapportoPrev = 0

for index, row in df1.iterrows():

    'if NRAPPORT changes, I print the totalizer

    'I would prefer to obtain a new Dataframe with NRAPPOR and the four totalizer as new columns

    if((index[0]!=rapportoPrev) & (rapportoPrev!=0)):
        print(rapportoPrev,'\t', 'capOverdue: ', totCapOverdue, '\t', 'intOverdue: ', totIntOverdue, '\t','capToExpire: ', totCapToExpire,  '\t', 'intpToExpire: ', totIntToExpire)    


    'set totalizer to zero
    totCapOverdue = 0
    totIntOverdue = 0
    totCapToExpire = 0
    totIntToExpire = 0

if (index[1].strftime("%Y-%m-%d")  <= dataSoglia):
    totCapOverdue += row['IQTACAP']
    totIntOverdue += row['IQTAINT']
else:
    totCapToExpire += row['IQTACAP']
    totIntToExpire += row['IQTAINT']
rapportoPrev = index[0]
dataPrev=index[1]

这是我的输出:
2 cap过期:19999.999999999993 int过期:4887.200000000001 capToExpire:0 intpToExpire:0
3 cap过期:123156.18000000002 int过期:70519.02 capToExpire:26843.820000000003 intpToExpire:1528.9799999999996
4 capOverdue:30000.0 intOverdue:4965.180000000001 capToExpire:0 intpToExpire:0
5 cap过期:6000.000000000002 int过期:167.1 capToExpire:0 intpToExpire:0
6 capOverdue:18000.0 intOverdue:2111.89 capToExpire:0 intpToExpire:0
7 capOverdue:50000.00000000003 intOverdue:8104.3 capToExpire:0 intpToExpire:0
8 cap过期:50000.00000000003 int过期:15711.999999999996 capToExpire:0 intpToExpire:0
9 capOverdue:70000.0 intOverdue:18213.110000000004 capToExpire:0 intpToExpire:0
...

'有更好的方法吗?
谢谢

1 个答案:

答案 0 :(得分:0)

首先,我创建一个带有数据子集的pandas DataFrame(这是在问题中可以做的事情,以使代码可再现):

D

现在,我使用熊猫import pandas as pd data = [[2, '2006-12-31', 0.0, 60.0], [2, '2007-01-31', 270.75, 150.05], [2, '2007-02-28', 272.78, 148.02], [2, '2007-03-31', 274.82, 145.98], [2, '2007-04-30', 276.88, 143.92], [5731, '2016-11-17', 1760.0, 240.0], [5731, '2018-11-17', 1800.0, 200.0], [5731, '2019-11-17', 1850.0, 150.0], [5731, '2020-11-17', 1900.0, 100.0], [5731, '2021-11-17', 1950.0, 50.0]] df = pd.DataFrame(data, columns = ['NRAPPOR', 'DSCARAT', 'IQTACAP','IQTAINT']) df = df.set_index('NRAPPOR') #--- convert string to datetime df['DSCARAT'] = df['DSCARAT'].apply(lambda ts: pd.Timestamp(ts)) 根据ID拆分表,并使用groupby将函数应用于每个组:查询相关日期和总和:

apply