计算熊猫数据框的列中某个字符的出现

时间:2019-02-28 14:00:38

标签: python pandas dataframe

我有一个具有以下结构的数据框

Debtor_ID    | Loan_ID    | Pattern_of_payments
Uncle Sam      Loan1        11111AAA11555
Uncle Sam      Loan2        11222A339999
Uncle Joe      Loan3        1111111111111
Uncle Joe      Loan4        111222222233333
Aunt Annie     Loan5        1
Aunt Chloe     Loan6        555555555

“ Pattern_of_payments”列中的每个字符都表示按时付款(例如1)或延迟(所有其余部分)。 我想做的是计算“ Pattern_of_payments”列的每一行中每个字符的出现次数,并将该数字分配给数据框中的相应列,如下所示:

Debtor_ID    | Loan_ID    | On_time_payment    | 1_29_days_delay    | 30_59_days_delay    | 60_89_days_delay    | 90_119_days_delay    | Over_120_days_delay    | Bailiff_prosecution
Uncle Sam      Loan1        7                    3                    0                     0                     0                      3                        0
Uncle Sam      Loan2        2                    1                    3                     2                     0                      3                        4
Uncle Joe      Loan3        13                   0                    0                     0                     0                      0                        0
Uncle Joe      Loan4        3                    0                    7                     4                     0                      0                        0
Aunt Annie     Loan5        1                    0                    0                     0                     0                      0                        0
Aunt Chloe     Loan6        0                    0                    0                     0                     0                      9                        0

我的代码以这种方式完成任务:

list_of_counts_of_1 = []
list_of_counts_of_A = []
list_of_counts_of_2 = []
list_of_counts_of_3 = []
list_of_counts_of_4 = []
list_of_counts_of_5 = []
list_of_counts_of_8 = []
list_of_counts_of_9 = []
for value in df_account.Pattern_of_payments.values:
    iter_string = str(value)
    count1 = iter_string.count("1")
    countA = iter_string.count("A")
    count2 = iter_string.count("2")
    count3 = iter_string.count("3")
    count4 = iter_string.count("4")
    count5 = iter_string.count("5")
    count8 = iter_string.count("8")
    count9 =  iter_string.count("9")
    list_of_counts_of_1.append(count1)
    list_of_counts_of_A.append(countA)
    list_of_counts_of_2.append(count2)
    list_of_counts_of_3.append(count3)
    list_of_counts_of_4.append(count4)
    list_of_counts_of_5.append(count5)
    list_of_counts_of_9.append(count9)
df_account["On_time_payment"] = list_of_counts_of_1
df_account["1_29_days_delay"] = list_of_counts_of_A
df_account["30_59_days_delay"] = list_of_counts_of_2
df_account["60_89_days_delay"] = list_of_counts_of_3
df_account["90_119_days_delay"] = list_of_counts_of_4
df_account["Over_120_days_delay"] = list_of_counts_of_5
df_account["Bailiff_prosecution"] = list_of_counts_of_9

我意识到我的代码根本不是“ pythonic”的。必须有一种以更简洁的方式表达这一点的方法(甚至可能是一些花哨的单线)。 请告知最佳编码实践是什么样的?

1 个答案:

答案 0 :(得分:2)

第一步是由DataFrame在列表理解中创建Counter,然后使用reindex添加缺失的类别并更改列的顺序,rename列由{{1} }并通过join添加到原始dict

DataFrame

from collections import Counter

df1 = pd.DataFrame([Counter(list(x)) for x in df['Pattern_of_payments']], index=df.index)
order = list('1A23459')

d = {'1': "On_time_payment",
     'A': "1_29_days_delay",
     '2':"30_59_days_delay",
     '3':"60_89_days_delay",
     '4':"90_119_days_delay",
     '5':"Over_120_days_delay",
     '9':"Bailiff_prosecution"}

df2 = df1.fillna(0).astype(int).reindex(columns=order, fill_value=0).rename(columns=d)
df = df.join(df2)