从我的代码的过程性思维到功能性思维

时间:2018-11-29 18:34:36

标签: python

从简单的行编码到定义或for循环,使下面的代码更有效和更具可读性,我遇到了困难。

我的数据示例(从SQL中提取),如下所示:

+----+-----------+------------+---------+-----------+----------+
| id | member_id |  max_date  | Recency | Frequency | Monetary |
+----+-----------+------------+---------+-----------+----------+
|  1 |        22 | 2016-09-03 |     818 |        10 |       50 |
|  2 |        34 | 2017-06-27 |     521 |        50 |      100 |
|  3 |       123 | 2018-10-26 |      35 |         5 |       80 |
+----+-----------+------------+---------+-----------+----------+

我正在创建三个新表,因为我需要根据新近度频率和货币列查找总和和总和%,并且这些列需要以不同的顺序排列:

rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)
rfm_recency['cum_sum'] = rfm_recency['Recency'].cumsum()
rfm_recency['cum_sum_perc'] = rfm_recency['cum_sum']/rfm_recency['Recency'].sum()

rfm_frequency = rfm[['Id', 'Frequency']].copy()
rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum']/rfm_frequency['Frequency'].sum()

rfm_monetary = rfm[['Id', 'Monetary']].copy()
rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum']/rfm_monetary['Monetary'].sum()

然后基于cum_sum_perc列,我为每个表应用一个函数:

def score(x):
    if x <= 0.20:
        return 5
    elif x <= 0.40:
        return 4
    elif x <= 0.60:
        return 3
    elif x <= 0.80:
        return 2
    else:
        return 1

rfm_recency['r_quintile'] = rfm_recency['cum_sum_perc'].apply(score)
rfm_frequency['f_quintile'] = rfm_frequency['cum_sum_perc'].apply(score)
rfm_monetary['m_quintile'] = rfm_monetary['cum_sum_perc'].apply(score)

然后,我在ID上求助于他们,将它们合并到一起:

rfm_recency = rfm_recency.sort_values('Id')
rfm_frequency = rfm_frequency.sort_values('Id')
rfm_monetary = rfm_monetary.sort_values('Id')

result = rfm_recency.copy()
result = result.join(rfm_frequency[['Frequency', 'f_quintile']])
result = result.join(rfm_monetary[['Monetary', 'm_quintile']])

作为Python的新手,我将继续到目前为止的工作,但我知道这可以通过DRAMMATICAlly进行修整。

0 个答案:

没有答案