我有一个使用groupby创建的pandas数据帧,返回结果如下:
loan_type
type
risky 23150
safe 99457
我想创建一个名为pct的列,并将其添加到我执行此操作的数据框中:
total = loans.sum(numeric_only=True)
loans['pct'] = loans.apply(lambda x:x/ total)
结果就是这样:
loan_type pct
type
risky 23150 NaN
safe 99457 NaN
此时我不知道我需要做些什么才能让下面的代码看到我创建整个事情的代码:
import numpy as np
bad_loans = np.array(club['bad_loans'])
for index, row in enumerate(bad_loans):
if row == 0:
bad_loans[index] = 1
else:
bad_loans[index] = -1
loans = pd.DataFrame({'loan_type' : bad_loans})
loans['type'] = np.where(loans['loan_type'] == 1, 'safe', 'risky')loans = np.absolute(loans.groupby(['type']).agg({'loan_type': 'sum'}))
total = loans.sum(numeric_only=True)
loans['pct'] = loans.apply(lambda x:x/ total)
答案 0 :(得分:1)
有问题你想要除以值,而是一个值Series
,因为没有对齐indexes
得到NaN
s。
我认为最简单的是将Series
total
转换为numpy array
:
total = loans.sum(numeric_only=True)
loans['pct'] = loans.loan_type / total.values
print (loans)
loan_type pct
type
risky 23150 0.188815
safe 99457 0.811185
或者通过索引[0]
转换选择 - 输出是数字:
total = loans.sum(numeric_only=True)[0]
loans['pct'] = loans.loan_type / total
print (loans)
loan_type pct
type
risky 23150 0.188815
safe 99457 0.811185