我有一个具有以下结构的数据框
Debtor_ID | Loan_ID | Pattern_of_payments
Uncle Sam Loan1 11111AAA11555
Uncle Sam Loan2 11222A339999
Uncle Joe Loan3 1111111111111
Uncle Joe Loan4 111222222233333
Aunt Annie Loan5 1
Aunt Chloe Loan6 555555555
“ Pattern_of_payments”列中的每个字符都表示按时付款(例如1)或延迟(所有其余部分)。 我想做的是计算“ Pattern_of_payments”列的每一行中每个字符的出现次数,并将该数字分配给数据框中的相应列,如下所示:
Debtor_ID | Loan_ID | On_time_payment | 1_29_days_delay | 30_59_days_delay | 60_89_days_delay | 90_119_days_delay | Over_120_days_delay | Bailiff_prosecution
Uncle Sam Loan1 7 3 0 0 0 3 0
Uncle Sam Loan2 2 1 3 2 0 3 4
Uncle Joe Loan3 13 0 0 0 0 0 0
Uncle Joe Loan4 3 0 7 4 0 0 0
Aunt Annie Loan5 1 0 0 0 0 0 0
Aunt Chloe Loan6 0 0 0 0 0 9 0
我的代码以这种方式完成任务:
list_of_counts_of_1 = []
list_of_counts_of_A = []
list_of_counts_of_2 = []
list_of_counts_of_3 = []
list_of_counts_of_4 = []
list_of_counts_of_5 = []
list_of_counts_of_8 = []
list_of_counts_of_9 = []
for value in df_account.Pattern_of_payments.values:
iter_string = str(value)
count1 = iter_string.count("1")
countA = iter_string.count("A")
count2 = iter_string.count("2")
count3 = iter_string.count("3")
count4 = iter_string.count("4")
count5 = iter_string.count("5")
count8 = iter_string.count("8")
count9 = iter_string.count("9")
list_of_counts_of_1.append(count1)
list_of_counts_of_A.append(countA)
list_of_counts_of_2.append(count2)
list_of_counts_of_3.append(count3)
list_of_counts_of_4.append(count4)
list_of_counts_of_5.append(count5)
list_of_counts_of_9.append(count9)
df_account["On_time_payment"] = list_of_counts_of_1
df_account["1_29_days_delay"] = list_of_counts_of_A
df_account["30_59_days_delay"] = list_of_counts_of_2
df_account["60_89_days_delay"] = list_of_counts_of_3
df_account["90_119_days_delay"] = list_of_counts_of_4
df_account["Over_120_days_delay"] = list_of_counts_of_5
df_account["Bailiff_prosecution"] = list_of_counts_of_9
我意识到我的代码根本不是“ pythonic”的。必须有一种以更简洁的方式表达这一点的方法(甚至可能是一些花哨的单线)。 请告知最佳编码实践是什么样的?
答案 0 :(得分:2)
第一步是由DataFrame
在列表理解中创建Counter
,然后使用reindex
添加缺失的类别并更改列的顺序,rename
列由{{1} }并通过join
添加到原始dict
:
DataFrame
from collections import Counter
df1 = pd.DataFrame([Counter(list(x)) for x in df['Pattern_of_payments']], index=df.index)
order = list('1A23459')
d = {'1': "On_time_payment",
'A': "1_29_days_delay",
'2':"30_59_days_delay",
'3':"60_89_days_delay",
'4':"90_119_days_delay",
'5':"Over_120_days_delay",
'9':"Bailiff_prosecution"}
df2 = df1.fillna(0).astype(int).reindex(columns=order, fill_value=0).rename(columns=d)
df = df.join(df2)