我有一个客户数据框puzzle
和他们拥有的项目类型。如果客户有多个项目,则可以在列表中多次出现。
name type
m1 A
m10 A
m2 A
m9 A
m9 B
m4 B
m5 B
m1 C
m2 C
m3 C
m4 C
m5 C
m6 C
m7 C
m8 C
m1 D
m5 D
我想计算拥有“A”,拥有“B”等的人的百分比,等等。
根据以上输入,如何使用R:
获得这样的输出 A B C D TOTAL
A 1 0.25 0.5 0.25 4
B 0.33 1 0.67 0.33 3
C 0.25 0.25 1 0.25 8
D 0.5 0.5 1 1 2
非常感谢你的帮助!
以下是漫长而手动的方式,没有任何循环或高级功能(但当然这在R中浪费了潜力):
项目A的示例: -
puzzleA <- subset(puzzle, type == 'A')
计算拥有A的客户,他们也拥有B: -
length(unique((merge(puzzleA, puzzleB, by = 'name'))$name))/length(unique(puzzleA$name)
数据
puzzle <- structure(list(name = c("m1", "m10", "m2", "m9", "m9", "m4",
"m5", "m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8", "m1", "m5"
), type = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C",
"C", "C", "C", "C", "C", "D", "D")), .Names = c("name", "type"
), class = "data.frame", row.names = c(NA, -17L))
答案 0 :(得分:3)
您还可以构建一组关联规则,例如:
library(arules)
trans <- as(lapply(split(puzzle[2], puzzle[1]), unlist, F, F), "transactions")
rules <- apriori(trans, parameter = list(support=0, minlen=2, maxlen=2, conf=0))
res <- data.frame(
lhs = labels(lhs(rules)),
rhs = labels(rhs(rules)),
value = round(rules@quality$confidence, 2)
)
res <- reshape2::dcast(res, lhs~rhs, fill = 1)
res$total <- rowSums(trans@data)
res
# lhs {A} {B} {C} {D} total
# 1 {A} 1.00 0.25 0.50 0.25 4
# 2 {B} 0.33 1.00 0.67 0.33 3
# 3 {C} 0.25 0.25 1.00 0.25 8
# 4 {D} 0.50 0.50 1.00 1.00 2
答案 1 :(得分:3)
我们可以使用merge/table
执行此操作。我们merge
数据集自身by
&#39;名称&#39;,删除第一列,使用table
获取频次数(&#39; tbl&#39;) ,用对角元素除以&#39; tbl&#39;和cbind
的对角线元素。
tbl <- table(merge(puzzle, puzzle, by = "name")[-1])
cbind(round(tbl/diag(tbl),2), TOTAL= diag(tbl))
# A B C D TOTAL
#A 1.00 0.25 0.50 0.25 4
#B 0.33 1.00 0.67 0.33 3
#C 0.25 0.25 1.00 0.25 8
#D 0.50 0.50 1.00 1.00 2
答案 2 :(得分:2)
很好地应用我的问题和答案:How to perform pairwise operation like %in%
and set operations for a list of vectors。
## separate out people by type
lst <- with(puzzle, split(name, type))
#List of 4
# $ A: chr [1:4] "m1" "m10" "m2" "m9"
# $ B: chr [1:3] "m9" "m4" "m5"
# $ C: chr [1:8] "m1" "m2" "m3" "m4" ...
# $ D: chr [1:2] "m1" "m5"
## pairwise intersect (a matrix of list)
pair_intersect <- outer(lst, lst, Vectorize(intersect))
# A B C D
#A Character,4 "m9" Character,2 "m1"
#B "m9" Character,3 Character,2 "m5"
#C Character,2 Character,2 Character,8 Character,2
#D "m1" "m5" Character,2 Character,2
## count number of people in each pair
count <- matrix(lengths(pair_intersect), nrow = length(lst),
dimnames = dimnames(pair_intersect))
# A B C D
#A 4 1 2 1
#B 1 3 2 1
#C 2 2 8 2
#D 1 1 2 2
## conditional percentage
conditional_percent <- count / diag(count)
# A B C D
#A 1.0000000 0.25 0.5000000 0.2500000
#B 0.3333333 1.00 0.6666667 0.3333333
#C 0.2500000 0.25 1.0000000 0.2500000
#D 0.5000000 0.50 1.0000000 1.0000000
如果您想将对角线附加到最后一列,请使用
final <- cbind(conditional_percent, Total = diag(count))
# A B C D Total
#A 1.0000000 0.25 0.5000000 0.2500000 4
#B 0.3333333 1.00 0.6666667 0.3333333 3
#C 0.2500000 0.25 1.0000000 0.2500000 8
#D 0.5000000 0.50 1.0000000 1.0000000 2