假设我有以下数据框,其中用户ID和位置为两列。用户ID可以包含多个位置。我有兴趣根据用户ID找到每个可能位置序列的可能性。
所以如果我的数据看起来像这样:
places = data.frame(user_id=c(1,1,2,3,3,3,4,4,5,5,5,5),
location=c("home","school","work","home","school","work",
"lunch","airport","gym","breakfast","work","home"))
places
我想找到以下内容:
freq = data.frame(location_path=c("home - school", "work", "home - school - work",
"lunch - airport", "gym - breakfast - work - home"),
count=c(1,1,1,1,1))
freq
这第二个数据框告诉我“家”'和学校'配对按此顺序发生两次。此外,家庭,学校和工作配对也只发生过一次。
当然可能存在配对多次发生的情况。在下列情况下,家庭,学校和工作配对的计数为2。
places = data.frame(user_id=c(1,1,2,3,3,3,4,4,5,5,5,5,6,6,6),
location=c("home","school","work","home","school","work",
"lunch","airport","gym","breakfast","work","home",
"home","school","work"))
places