在线零售中用户产品购买的过渡计数

时间:2017-08-06 23:29:23

标签: python list dictionary tuples markov-chains

我正在尝试从头开始构建马尔可夫链用户转换矩阵,但却被卡在字典值赋值中。以下是示例代码

## user purchase sequence seperated by '|' at different time intervals
## lets say in first purchase user bought 3 4 12 23 45 41 25 these products then 4 5 12 17 19 25 46 3 and so on
user_purchase = '3 4 12 23 45 41 25|4 5 12 17 19 25 46 3|39 12 3 23 50 24 35 13|42 34 17 19 46'
## I need to find the transition count from first purchase to second and so on
## e.g 3-1 is 0 , 3-2 is 0 , 3-3 is 0 , 3-4 is 1
## hence output should be {...,2:[(0,0),(0,0),.....], 3:[(0,1),(0,1),(0,1),(1,1), ...], 4:[...]} its a dictionary of list with tuples

### lets say its the total no of products ranging from 1 to 50 that user can buy
prod = range(1,51)

### initializing a dictionary of list with tuples
t = (0,0)
list1= []
for _ in range(len(prod)):
    list1.append(t)
user_tran = {}
for p in prod:
    user_tran[p]= list1


# def trans_matrix(prod_seq):
basket_seq = user_purchase.split('|')
iteration = len(basket_seq)
for j in range(iteration-1):
    trans_from = basket_seq[j]
    trans_to = basket_seq[j+1]
    tfrom = map(int,trans_from.split(' '))
    print tfrom
    tto = map(int,trans_to.split(' '))
    for item in tfrom:
### problem here is in each iteration the default value for all keys is updated from [(0,0),(0,0),....] to item_list
        item_list = user_tran[item]   ### seems problem here
        for i in range(len(prod)):
            if i+1 in tto:
               temp =  item_list[i]
               x = list(temp)
               x[0] = x[0] +1
               x[1] = x[1] +1
               item_list[i] = tuple(x)
            else:
                temp = item_list[i]
                x = list(temp)
                x[0] = x[0]
                x[1] = x[1] + 1
                item_list[i] = tuple(x)
        user_tran[item] = item_list  ### list updation should only be for item specified as key in user_tran but all keys are updated with same value
  

user_tran [3] [1:5]

     

出[38]:[(0,23),(15,23),(7,23),(7,23)]

期望的输出

0在不同时间的3个购买序列中从3过渡到1,2并且产品3在前三个购买序列中存在。 但是从3-3

有两个过渡
  

[(0,3),(0,3),(2,3),......直到产品50]

1 个答案:

答案 0 :(得分:0)

我没有找到原因,但我尝试使用numpy数组实现它,没有任何元组和字典。

我的输出与您预期的输出不同,但我完全按照您的目标使用字典。它只是字典列表版本到numpy数组版本的翻译。可能它会帮助你。

import numpy as np

user_purchase = '3 4 12 23 45 41 25|4 5 12 17 19 25 46 3|39 12 3 23 50 24 35 13|42 34 17 19 46'
prod = range(0, 50)
user_tran = np.zeros((50,50,2))
basket_seq = user_purchase.split('|')
iteration = len(basket_seq)
for j in range(iteration-1):
    trans_from = basket_seq[j]
    trans_to = basket_seq[j+1]
    tfrom = map(int,trans_from.split(' '))
    tfrom = [x-1 for x in tfrom]
    tto = map(int,trans_to.split(' '))
    tto = [x - 1 for x in tto]
    for item in tfrom:
        item_list = user_tran[item, :, :]
        for i in range(len(prod)):
            if i + 1 in tto:
                temp = item_list[i, :]
                item_list[i, :] = np.array([temp[0] + 1, temp[1] + 1])
            else:
                temp = item_list[i, :]
                item_list[i, :] = np.array([temp[0], temp[0] + 1])
        user_tran[item, :, :] = item_list
print user_tran[2, 1:5, :]

user_tran表单如下:NxMx2其中N是字典版本中的键数,M是商店中的项目数,2是而不是具有2个值的元组。例如:要获得字典中的第3个键,以及列表中的第1个到第4个项,您必须编写

user_tran[2, 1:5, :] #instead of user_tran[3][1:5] 

因为数组以0索引开头但不是1。

你将获得4x2矩阵,其中4是列表中的元素数量,2是元组的2个值。