存在用户操作的记录,在示例中将其简化为“已购买”和“其他”(表1)。
我正在尝试添加列“ purchase_cycle”,该列的编号将指示一个组,该组包含从上次购买到当前购买的所有用户操作(如果是首次购买,则从步骤1到所有用户的操作)。如果有不以“购买”结束的一组动作,则该组不算作完整周期并分配为Nan。
TABLE1(添加了新行以使其更具可读性):
user_id actions_order action_category
0043e1a6 1 purchased
0043e1a6 2 other
0043e1a6 3 other
0070f782 1 other
0070f782 2 other
0070f782 3 other
0070f782 4 other
0070f782 5 other
0070f782 6 purchased
0070f782 7 other
0070f782 8 other
0070f782 9 other
0070f782 10 purchased
0070f782 11 other
0070f782 12 other
0070f782 13 other
008aa58a 1 other
008aa58a 2 other
008aa58a 3 other
008aa58a 4 other
008aa58a 5 purchased
008aa58a 6 other
008aa58a 7 other
008aa58a 8 other
008aa58a 9 other
008aa58a 10 other
008aa58a 11 other
008aa58a 12 purchased
008aa58a 13 other
008aa58a 14 other
008aa58a 15 other
TABLE2(购买周期):
user_id actions_order action_category purchase_cycle
0043e1a6 1 purchased 1
0043e1a6 2 other nan
0043e1a6 3 other nan
0070f782 1 other 1
0070f782 2 other 1
0070f782 3 other 1
0070f782 4 other 1
0070f782 5 other 1
0070f782 6 purchased 1
0070f782 7 other 2
0070f782 8 other 2
0070f782 9 other 2
0070f782 10 purchased 2
0070f782 11 other nan
0070f782 12 other nan
0070f782 13 other nan
008aa58a 1 other 1
008aa58a 2 other 1
008aa58a 3 other 1
008aa58a 4 other 1
008aa58a 5 purchased 1
008aa58a 6 other 2
008aa58a 7 other 2
008aa58a 8 other 2
008aa58a 9 other 2
008aa58a 10 other 2
008aa58a 11 other 2
008aa58a 12 purchased 2
008aa58a 13 other nan
008aa58a 14 other nan
008aa58a 15 other nan
我只能找到James Schinner answer,但是他的解决方案假定所有组的块大小都是相同的,这不是我的情况。
def chunk(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
for df_chunk in chunk(df, 100):
# |
# The chunk size
# your code here
pass