我有此数据表,这些数据表具有通过信用评级建立的常见链接。我正在尝试使用相应的计算创建一个单独的表:
- [(amount of AAA in column A) / sum of column A] x PD% of credit rating 'AAA'
此计算在表中重复进行。如何创建一个函数来执行此计算,以便对第一个数据框使用pandas apply方法?
表1:(按列显示的金额)
Credit Rating A B C D
AAA 84,559,273 3,304,460 11,373,110 412,488
AA+ - -
AA 639,253 - 74,268
AA- 6,166,505
A+ 150,165,714 10,994,525 77,932,268 16,889,894
A 3,309,726 - 2,156,360 15,862,911
A- 22,128,939 5,886,348 45,237,747 364,115,185
BBB+ 2,192,714 - 4,892,915 45,679,052
BBB 39,952,215 - 4,767,023 60,059,238
BBB- 28,157,622 - 6,224,887 25,326,451
BB+ 4,399,331 - 697,172 -
BB 6,748,039 1,646,525
BB- 26,074,209 233,146 23,628,360 228,099
B+ 645,543 1,623,945 -
B 218,630 - 3,059,798 -
B- 804,872 - - -
C+ - - -
C
C- - -
CC+ - -
CC - - 7,057 -
CC- - - - -
CCC+ 64,923 -
CCC 83,589
CCC- - - - -
D - - -
表2:(信用评分和PD%)
Credit Rating PD%
AAA 0.01%
AA+ 0.02%
AA 0.03%
AA- 0.04%
A+ 0.05%
A 0.06%
A- 0.07%
BBB+ 0.08%
BBB 0.09%
BBB- 0.10%
BB+ 0.11%
BB 0.12%
BB- 0.13%
B+ 0.14%
B 0.15%
B- 0.16%
C+ 0.17%
C 0.18%
C- 0.19%
CC+ 0.20%
CC 0.21%
CC- 0.22%
CCC+ 0.23%
CCC 0.24%
CCC- 0.25%
答案 0 :(得分:0)
您的数据集包含NaN,您可以通过dropna()
table2 = table2.dropna()
下一步是捕获值。您的数字之间有一个逗号,我们将其替换为空白。然后将其转换为float。
pd = float(table1.loc[table1['Credit Rating']=='AAA']['PD%'].str[:-1])
amt = float(table2.loc[table2['Credit Rating']=='AAA']['A'].str.replace(',', ''))
def summation(x):
if x!='-':
print(x)
return float(x.replace(',', ''))
return 0
sum_col = sum(table2['A'].apply(summation))
result = amt/sum_col*pd
结果:
0.0023361287138422026
====================================
代码:
table1.set_index('Credit Rating', inplace=True)
table2.set_index('Credit Rating', inplace=True)
table3 = table1.loc[table1.index & table2.index]
table3 = table3['PD%'].str[:-1].astype(float)
def result(row):
row = row.replace({'-': 0}, regex=True)
row = row.str.replace(',', '').astype(float)
sum_col = sum(row.fillna(0))
return (row/sum_col)*table3
table2.apply(result)
A B C D
Credit Rating
AAA 0.002336 0.001618 0.000632 0.000008
A+ 0.020743 0.026923 0.021651 0.001598
A 0.000549 NaN 0.000719 0.001801
A- 0.004280 0.020180 0.017595 0.048220
BBB+ 0.000485 NaN 0.002175 0.006914
BBB 0.009934 NaN 0.002384 0.010226
BBB- 0.007779 NaN 0.003459 0.004791
BB+ 0.001337 NaN 0.000426 NaN
BB- 0.009365 0.001484 0.017067 0.000056
B 0.000091 NaN 0.002550 NaN
B- 0.000356 NaN NaN NaN
CC NaN NaN 0.000008 NaN
CC- NaN NaN NaN NaN
CCC- NaN NaN NaN NaN
答案 1 :(得分:0)
下面的替代代码:
Note: Renamed `Credit Rating` to `Credit_Rating`
# Import libraries
import pandas as pd
import numpy as np
# Read data from *.txt file
path = "<Enter path to txt file here>" # Copy-pasted data from question into .txt files.
dfa = pd.read_csv(path+'data.txt',sep=r"\s+")
dfa = dfa.replace('-',np.nan)
dfc = pd.read_csv(path+'data2.txt',sep=r"\s+")
dfc = dfc.replace('-',np.nan)
# Format data: Remove commas
for col in dfa.columns:
if(dfa[col].dtypes=='O'):
dfa[col] = dfa[col].str.replace(',', '')
# Format data: Remove %
dfc['PD%'] = dfc['PD%'].str.replace('%', '')
# Create grouped object
g = dfa.groupby(['Credit_Rating'])
# Grouped sum
sum_A = g['A'].sum().reset_index().rename(columns={'A':'sum_A'})
sum_B = g['B'].sum().reset_index().rename(columns={'B':'sum_B'})
sum_C = g['C'].sum().reset_index().rename(columns={'C':'sum_C'})
sum_D = g['D'].sum().reset_index().rename(columns={'D':'sum_D'})
# Merge
dfa = dfa.merge(sum_A, on='Credit_Rating', how='left')\
.merge(sum_B, on='Credit_Rating', how='left')\
.merge(sum_C, on='Credit_Rating', how='left')\
.merge(sum_D, on='Credit_Rating', how='left')\
.merge(dfc, on='Credit_Rating', how='left')
# Convert string to float
clist = ['A', 'B', 'C', 'D', 'sum_A', 'sum_B', 'sum_C', 'sum_D','PD%']
for col in clist:
dfa[col] = dfa[col].astype(float)
# Calculate
df = dfa.copy()
df['calc_A'] = (df['A']/df['sum_A'])*df['PD%']
df['calc_B'] = (df['B']/df['sum_B'])*df['PD%']
df['calc_C'] = (df['C']/df['sum_C'])*df['PD%']
df['calc_D'] = (df['D']/df['sum_D'])*df['PD%']
df = df[['Credit_Rating','calc_A','calc_B','calc_C','calc_D']]
print(df)
Credit_Rating calc_A calc_B calc_C calc_D
0 AAA 0.01 0.01 0.01 0.01
1 AA+ NaN NaN NaN NaN
2 AA 0.03 NaN 0.03 NaN
3 AA- 0.04 NaN NaN NaN
4 A+ 0.05 0.05 0.05 0.05
5 A 0.06 NaN 0.06 0.06
6 A- 0.07 0.07 0.07 0.07
7 BBB+ 0.08 NaN 0.08 0.08
8 BBB 0.09 NaN 0.09 0.09
9 BBB- 0.10 NaN 0.10 0.10
10 BB+ 0.11 NaN 0.11 NaN
11 BB 0.12 0.12 NaN NaN
12 BB- 0.13 0.13 0.13 0.13
13 B+ 0.14 0.14 NaN NaN
14 B 0.15 NaN 0.15 NaN
15 B- 0.16 NaN NaN NaN
16 C+ NaN NaN NaN NaN
17 C NaN NaN NaN NaN
18 C- NaN NaN NaN NaN
19 CC+ NaN NaN NaN NaN
20 CC NaN NaN 0.21 NaN
21 CC- NaN NaN NaN NaN
22 CCC+ 0.23 NaN NaN NaN
23 CCC 0.24 NaN NaN NaN
24 CCC- NaN NaN NaN NaN
25 D NaN NaN NaN NaN