数据集是基于在线购买信息的在线市场示例(ebay,亚马逊)。
user_id, product_code, bought_date, time_spent, store_id, product_type, refurbished, unqiue_visit_id
001, e.12, 20120102, 104, 101, computer, yes, 1010
002, e.24, 20120201, 100, 101, infant-dress, no, 2001
003, s.32, 20130302, 230, 101, shoes, no, 2121
004, y.23, 20130404, 212, 103, computer, yes, 2422
005, s.43, 20130803, 104, 101, laptop, yes, 2342
001, a.12, 20120202, 104, 101, computer, yes, 1011
002, b.24, 20120201, 100, 101, infant-dress, no, 2001
003, c.32, 20130302, 230, 101, shoes, no, 2122
004, e.23, 20130404, 212, 103, computer, yes, 2424
005, f.43, 20130803, 104, 101, laptop, yes, 2340
001, g.12, 20120102, 104, 101, computer, yes, 1013
002, h.24, 20120201, 100, 101, infant-dress, no, 2031
003, l.32, 20130302, 230, 101, shoes, no, 2000
004, m.23, 20130404, 212, 103, computer, yes, 1422
005, d.43, 20130803, 104, 101, laptop, yes, 1142
001, d.12, 20120102, 104, 101, desk, yes, 1110
002, f.24, 20120201, 100, 101, glass, no, 1111
003, n.32, 20130302, 230, 101, liquid, no, 2021
004, t.23, 20130404, 212, 103, liquid, yes, 22
005, u.43, 20130803, 104, 101, dress, yes, 2942
001, d.12, 20120102, 104, 101, desk, yes, 1910
002, f.24, 20120201, 100, 101, glass, no, 2901
003, n.32, 20130302, 230, 101, liquid, no, 2921
004, t.23, 20130404, 212, 103, liquid, yes, 2922
005, u.43, 20130803, 104, 101, dress, yes, 2942
001, kk.12, 20120103, 105, 101, desk, yes, 410
003, n.32, 20130303, 230, 101, liquid, no, 2621
unique_visit_id
使用user_id
,product_code
,store_id
,product_type
和bought_date
创建
目标是首先通过将user_id
和product_type
分组来获得唯一身份访问次数
test.visits <- test %>%
group_by(user_id,product_type) %>%
summarize(visit_count = n_distinct(unqiue_visit_id)) %>%
arrange(desc(visit_count),user_id)
user_id product_type visit_count
<int> <fct> <int>
1 1 " computer" 3
2 1 " desk" 3
3 2 " infant-dress" 3
4 3 " liquid" 3
5 3 " shoes" 3
6 4 " computer" 3
7 5 " laptop" 3
8 2 " glass" 2
9 4 " liquid" 2
10 5 " dress" 2
现在,我想根据最高访问次数将产品类型分配给用户。如果按新近度(bought_date
)refurbish
来回访问,则store id
的最后一个值。
example:
1 1 " computer" 3
2 1 " desk" 3
打领带的基本条件。最高访问次数product_type被分配给组内的用户