我的数据设置为
df=data.frame(ID=c('A', 'A','A','B','B','C','C','C', 'C', 'C','D', 'E', 'E'),
drink_freq = c('Coffee Light', 'Water Heavy', 'Tea Medium',
'Coffee Medium', 'Water Light',
'Espresso Light', 'Coffee Medium', 'Water Light', 'Soda Light', 'Tea Medium',
'Coffee Heavy',
'Coffee Medium', 'Soda Light'))
我想做的是创建某种列联表,该表显示用户可能属于的不同段的组合的频率。因此,例如...苏打轻型咖啡中型和咖啡中型水轻型为2,而轻型咖啡水重型为1。
我觉得这并不困难,但是我很难编写代码来执行此操作,因为用户可以属于不同数量的组。
答案 0 :(得分:0)
这是一个tidyverse
解决方案,它创建饮料的所有唯一组合(即考虑饮料的顺序)并计算他们拥有多少普通用户:
df=data.frame(ID=c('A', 'A','A','B','B','C','C','C', 'C', 'C','D', 'E', 'E'),
drink_freq = c('Coffee Light', 'Water Heavy', 'Tea Medium',
'Coffee Medium', 'Water Light',
'Espresso Light', 'Coffee Medium', 'Water Light', 'Soda Light', 'Tea Medium',
'Coffee Heavy',
'Coffee Medium', 'Soda Light'), stringsAsFactors = F)
library(tidyverse)
data.frame(t(combn(unique(df$drink_freq), 2)), stringsAsFactors = F) %>%
mutate(counts = map2_dbl(X1, X2, ~length(intersect(df$ID[df$drink_freq==.x],
df$ID[df$drink_freq==.y]))))
# X1 X2 counts
# 1 Coffee Light Water Heavy 1
# 2 Coffee Light Tea Medium 1
# 3 Coffee Light Coffee Medium 0
# 4 Coffee Light Water Light 0
# 5 Coffee Light Espresso Light 0
# 6 Coffee Light Soda Light 0
# 7 Coffee Light Coffee Heavy 0
# 8 Water Heavy Tea Medium 1
# 9 Water Heavy Coffee Medium 0
# 10 Water Heavy Water Light 0
# 11 Water Heavy Espresso Light 0
# 12 Water Heavy Soda Light 0
# 13 Water Heavy Coffee Heavy 0
# 14 Tea Medium Coffee Medium 1
# 15 Tea Medium Water Light 1
# 16 Tea Medium Espresso Light 1
# 17 Tea Medium Soda Light 1
# 18 Tea Medium Coffee Heavy 0
# 19 Coffee Medium Water Light 2
# 20 Coffee Medium Espresso Light 1
# 21 Coffee Medium Soda Light 2
# 22 Coffee Medium Coffee Heavy 0
# 23 Water Light Espresso Light 1
# 24 Water Light Soda Light 1
# 25 Water Light Coffee Heavy 0
# 26 Espresso Light Soda Light 1
# 27 Espresso Light Coffee Heavy 0
# 28 Soda Light Coffee Heavy 0
然后您可以将以上输出调整为列联表。
注意,如果要重塑形状并获得对称输出,则必须通过创建所有可能的组合来修改上述代码,以忽略饮料的顺序,如下所示:
expand.grid(X1=unique(df$drink_freq),
X2=unique(df$drink_freq), stringsAsFactors = F) %>%
mutate(counts = map2_dbl(X1, X2, ~length(intersect(df$ID[df$drink_freq==.x],
df$ID[df$drink_freq==.y])))) %>%
filter(X1 != X2)