熊猫:数据框的计算

时间:2020-07-07 08:33:46

标签: python pandas

我有此数据表,这些数据表具有通过信用评级建立的常见链接。我正在尝试使用相应的计算创建一个单独的表:

 - [(amount of AAA in column A) / sum of column A] x PD% of credit rating 'AAA' 

此计算在表中重复进行。如何创建一个函数来执行此计算,以便对第一个数据框使用pandas apply方法?

表1:(按列显示的金额)

Credit Rating   A   B   C   D
AAA 84,559,273  3,304,460   11,373,110  412,488
AA+     -   -   
AA  639,253 -   74,268  
AA-         6,166,505   
A+  150,165,714 10,994,525  77,932,268  16,889,894
A   3,309,726   -   2,156,360   15,862,911
A-  22,128,939  5,886,348   45,237,747  364,115,185
BBB+    2,192,714   -   4,892,915   45,679,052
BBB 39,952,215  -   4,767,023   60,059,238
BBB-    28,157,622  -   6,224,887   25,326,451
BB+ 4,399,331   -   697,172 -
BB  6,748,039       1,646,525   
BB- 26,074,209  233,146 23,628,360  228,099
B+  645,543     1,623,945   -
B   218,630 -   3,059,798   -
B-  804,872 -   -   -
C+  -   -   -   
C               
C-      -   -   
CC+         -   -
CC  -   -   7,057   -
CC- -   -   -   -
CCC+            64,923  -
CCC         83,589  
CCC-    -   -   -   -
D   -   -   -   

表2:(信用评分和PD%)

Credit Rating   PD%
AAA 0.01%
AA+ 0.02%
AA  0.03%
AA- 0.04%
A+  0.05%
A   0.06%
A-  0.07%
BBB+    0.08%
BBB 0.09%
BBB-    0.10%
BB+ 0.11%
BB  0.12%
BB- 0.13%
B+  0.14%
B   0.15%
B-  0.16%
C+  0.17%
C   0.18%
C-  0.19%
CC+ 0.20%
CC  0.21%
CC- 0.22%
CCC+    0.23%
CCC 0.24%
CCC-    0.25%

2 个答案:

答案 0 :(得分:0)

您的数据集包含NaN,您可以通过dropna()

将其删除
table2 = table2.dropna()

下一步是捕获值。您的数字之间有一个逗号,我们将其替换为空白。然后将其转换为float。

pd = float(table1.loc[table1['Credit Rating']=='AAA']['PD%'].str[:-1])
amt = float(table2.loc[table2['Credit Rating']=='AAA']['A'].str.replace(',', ''))

def summation(x):
    if x!='-':
        print(x)
        return float(x.replace(',', ''))
    return 0

    
sum_col = sum(table2['A'].apply(summation))

result = amt/sum_col*pd

结果:

0.0023361287138422026

====================================

代码:

table1.set_index('Credit Rating', inplace=True)
table2.set_index('Credit Rating', inplace=True)
table3 = table1.loc[table1.index & table2.index]
table3 = table3['PD%'].str[:-1].astype(float)

def result(row):
    row = row.replace({'-': 0}, regex=True)
    row = row.str.replace(',', '').astype(float)
    sum_col = sum(row.fillna(0))
    return (row/sum_col)*table3

table2.apply(result)

              A         B           C           D
Credit Rating               
AAA           0.002336  0.001618    0.000632    0.000008
A+            0.020743  0.026923    0.021651    0.001598
A             0.000549  NaN         0.000719    0.001801
A-            0.004280  0.020180    0.017595    0.048220
BBB+          0.000485  NaN         0.002175    0.006914
BBB           0.009934  NaN         0.002384    0.010226
BBB-          0.007779  NaN         0.003459    0.004791
BB+           0.001337  NaN         0.000426    NaN
BB-           0.009365  0.001484    0.017067    0.000056
B             0.000091  NaN         0.002550    NaN
B-            0.000356  NaN         NaN         NaN
CC            NaN       NaN         0.000008    NaN
CC-           NaN       NaN         NaN         NaN
CCC-          NaN       NaN         NaN         NaN

答案 1 :(得分:0)

下面的替代代码:

Note: Renamed `Credit Rating` to `Credit_Rating`

# Import libraries
import pandas as pd
import numpy as np

# Read data from *.txt file 
path = "<Enter path to txt file here>" # Copy-pasted data from question into .txt files.
dfa = pd.read_csv(path+'data.txt',sep=r"\s+")
dfa = dfa.replace('-',np.nan)

dfc = pd.read_csv(path+'data2.txt',sep=r"\s+")
dfc = dfc.replace('-',np.nan)

# Format data: Remove commas
for col in dfa.columns:
    if(dfa[col].dtypes=='O'):
        dfa[col] = dfa[col].str.replace(',', '')

# Format data: Remove %
dfc['PD%'] = dfc['PD%'].str.replace('%', '')


# Create grouped object
g = dfa.groupby(['Credit_Rating'])

# Grouped sum
sum_A = g['A'].sum().reset_index().rename(columns={'A':'sum_A'})
sum_B = g['B'].sum().reset_index().rename(columns={'B':'sum_B'})
sum_C = g['C'].sum().reset_index().rename(columns={'C':'sum_C'})
sum_D = g['D'].sum().reset_index().rename(columns={'D':'sum_D'})


# Merge
dfa = dfa.merge(sum_A, on='Credit_Rating', how='left')\
    .merge(sum_B, on='Credit_Rating', how='left')\
    .merge(sum_C, on='Credit_Rating', how='left')\
    .merge(sum_D, on='Credit_Rating', how='left')\
    .merge(dfc, on='Credit_Rating', how='left')

# Convert string to float
clist = ['A', 'B', 'C', 'D', 'sum_A', 'sum_B', 'sum_C', 'sum_D','PD%']
for col in clist:
    dfa[col] = dfa[col].astype(float)
    

# Calculate
df = dfa.copy()
df['calc_A'] = (df['A']/df['sum_A'])*df['PD%']
df['calc_B'] = (df['B']/df['sum_B'])*df['PD%']
df['calc_C'] = (df['C']/df['sum_C'])*df['PD%']
df['calc_D'] = (df['D']/df['sum_D'])*df['PD%']
df = df[['Credit_Rating','calc_A','calc_B','calc_C','calc_D']]

输出

print(df)

   Credit_Rating  calc_A  calc_B  calc_C  calc_D
0            AAA    0.01    0.01    0.01    0.01
1            AA+     NaN     NaN     NaN     NaN
2             AA    0.03     NaN    0.03     NaN
3            AA-    0.04     NaN     NaN     NaN
4             A+    0.05    0.05    0.05    0.05
5              A    0.06     NaN    0.06    0.06
6             A-    0.07    0.07    0.07    0.07
7           BBB+    0.08     NaN    0.08    0.08
8            BBB    0.09     NaN    0.09    0.09
9           BBB-    0.10     NaN    0.10    0.10
10           BB+    0.11     NaN    0.11     NaN
11            BB    0.12    0.12     NaN     NaN
12           BB-    0.13    0.13    0.13    0.13
13            B+    0.14    0.14     NaN     NaN
14             B    0.15     NaN    0.15     NaN
15            B-    0.16     NaN     NaN     NaN
16            C+     NaN     NaN     NaN     NaN
17             C     NaN     NaN     NaN     NaN
18            C-     NaN     NaN     NaN     NaN
19           CC+     NaN     NaN     NaN     NaN
20            CC     NaN     NaN    0.21     NaN
21           CC-     NaN     NaN     NaN     NaN
22          CCC+    0.23     NaN     NaN     NaN
23           CCC    0.24     NaN     NaN     NaN
24          CCC-     NaN     NaN     NaN     NaN
25             D     NaN     NaN     NaN     NaN