我的文件就是这样-
Pcol Mcol
P1 M1,M2,M5,M6,M1,M2,M1.M5
P2 M1,M2,M3,M5,M1,M2,M1,M3
P3 M4,M5,M7,M6,M5,M7,M4,M7
我想find all the combination of Mcol elements
和find these combinatinatons are present in how many rows
。
预期输出-
Mcol freq
M1,M2 2
M1,M5 2
M1,M6 1
M2,M5 2
M2,M6 1
M5,M6 2
M1,M3 1
M2,M3 1
M4,M5 1
M4,M7 1
M4,M6 1
M7,M6 1
我已经尝试过了-
x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame,
lapply(x$Mcol, function(i){
n <- sort(unlist(strsplit(i, ",")))
t(combn(n, 2))
}))
data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))
它没有给出预期的输出
我也尝试过这个
library(tidyverse)
df1 %>%
separate_rows(Mcol) %>%
group_by(Pcol) %>%
summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>%
unnest %>%
unnest %>%
count(Mcol)
但是它没有给出行数中出现的合并频率。I want the frequency of row in which these combinations are present
。这意味着if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2
。
答案 0 :(得分:2)
+ FETCH FROM INDEX leafIndex
val > 150 and val < 300
+ EXTRACT VALUE FROM INDEX ENTRY
filtering clusters [273,274,275,276,277,278,279,280]
+ FILTER ITEMS WHERE
inE('C').tree = ["example"]
+ FILTER ITEMS BY CLASS
LEAF
+ CALCULATE PROJECTIONS
*
中的一个选项是将'Mcol'与tidyverse
分开,按'Pcol'分组,得到'Mcol'的separate_row
,并在{{1}之后}采用“ Mcol”列的combn
unnest