我有一个pandas数据框,其中包含数百万客户的产品名称[a,b,c,d,e,f,j,h,i,j,k,l]。 对于每个产品,数据报告客户在当月使用产品(表示为1)或未使用(表示为0)。
客户的原始分类:1表示使用,0表示不使用
我想将产品用途重新分类为四类:
S:用过
M:维持使用(随后几个月使用)
N:没用过
D:维持未使用(连续几个月未使用)
原始数据如下所示:
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| Customer_ID | Month | a | b | c | d | e | f | j | h | i | j | k | l |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| 19509 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19509 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19509 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
| 19510 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19510 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19510 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
| 19511 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19511 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19511 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
我想将客户重新分类为四类,以考虑那些维持使用或维持不使用数月的人。
结果应如下所示:
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| Customer_ID | Month | a | b | c | d | e | f | j | h | i | j | k | l |
+-------------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 19509 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19509 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19509 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19509 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19509 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19509 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19509 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19509 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19509 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
| 19510 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19510 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19510 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19510 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19510 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19510 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19510 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19510 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19510 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
| 19511 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19511 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19511 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19511 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19511 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19511 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19511 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19511 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19511 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
这样做的算法似乎很复杂,我仍然在考虑适当的顺序。
我想为所有客户和所有产品(列)做这件事,我想我们可以这样开始:
for i in customer_ID:
for j in df.columns:
注意:这种情况不是使用和非使用情况,而是join(1),cancel(0),keep idle(0)和if again(1)等等。因此,当它为零时,意味着客户取消了服务,当它在接下来的三个月内为零时,意味着他不是客户,然后他加入并再次取消,我们应该知道他取消服务的次数。如果我们只计算总数,则不会告诉我们客户加入的次数以及他取消特定产品或服务的次数。
我很感激任何建议或想法来解决这个问题。
答案 0 :(得分:0)
为了简单起见,我解释了如何为一个客户和一个产品执行此操作,然后您可以为每个客户和列执行此操作:
找到最早的条目(如果你在11月份这样做,那么你可以先查看12月,1月,2月等的值,直到找到值)并应用新值:
对于下一个(最多11个)条目,您可以根据之前的值以及此处标有f(old, val)
的列中的内容应用值:
在这种情况下,这可以简化(N / D和S / M产生相同的结果,只需查看前一个值而不是前一个状态),但是如果你有更复杂的状态转换,那么它也许不能,所以我写出来表明这个想法。
答案 1 :(得分:0)
提示:
你可以计算其余部分。
Kadane's algoritm - 最大子阵列 - 如果你使用+1标记,不使用-1,这个将告诉你使用普遍超过不使用的最长时间。