查找特定列的所有组合并查找其频率

时间:2019-06-27 15:14:34

标签: r read.csv

我的文件就是这样-

Pcol       Mcol
P1      M1,M2,M5,M6,M1,M2,M1.M5
P2      M1,M2,M3,M5,M1,M2,M1,M3
P3      M4,M5,M7,M6,M5,M7,M4,M7

我想find all the combination of Mcol elementsfind these combinatinatons are present in how many rows

预期输出-

Mcol        freq
M1,M2        2
M1,M5        2
M1,M6        1
M2,M5        2
M2,M6        1
M5,M6        2
M1,M3        1
M2,M3        1
M4,M5        1
M4,M7        1
M4,M6        1
M7,M6        1

我已经尝试过了-

x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame, 
              lapply(x$Mcol, function(i){
                n <- sort(unlist(strsplit(i, ",")))
                t(combn(n, 2))
              }))

data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))

它没有给出预期的输出

我也尝试过这个

library(tidyverse)
df1 %>%
   separate_rows(Mcol) %>%
   group_by(Pcol) %>%
   summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>% 
   unnest %>% 
   unnest %>%
   count(Mcol)

但是它没有给出行数中出现的合并频率。I want the frequency of row in which these combinations are present。这意味着if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2

1 个答案:

答案 0 :(得分:2)

+ FETCH FROM INDEX leafIndex val > 150 and val < 300 + EXTRACT VALUE FROM INDEX ENTRY filtering clusters [273,274,275,276,277,278,279,280] + FILTER ITEMS WHERE inE('C').tree = ["example"] + FILTER ITEMS BY CLASS LEAF + CALCULATE PROJECTIONS * 中的一个选项是将'Mcol'与tidyverse分开,按'Pcol'分组,得到'Mcol'的separate_row,并在{{1}之后}采用“ Mcol”列的combn

unnest