数据框行元素的成对比较

时间:2019-08-26 13:29:51

标签: r

我想找到数据帧行中所有公共元素的数量

name           members
x1            A,B,N,K,Y,G
x2            J,L,M,N,T
x3            G,H,S,J,D,F
x4            J,K,H,F,H,D,L

name         common       name
x1                   6               x1
x1                   2               x2
x1                     -             x3
x1                      -            x4
x2                       -           x1
x2                5        -          x2
x2                         -         x3
x2                          -        x4
x3                           -       x1
x3                            -      x2
x3                   6          -     x3
x3                              -    x4
x4                               -   x1
x4                                -  x2
x4                                 - x3
x4                  7                -x4

2 个答案:

答案 0 :(得分:1)

1)dplyr / tidyr 对于每一行,使用app.get('/createSection', isLoggedIn, (req, res, next)=>{ var query = {user:req.user._id}; dashModel.find(query,(err, data)=>{ if(err){ console.log(err) } console.log(data) res.send(data) }) }); 为每个成员创建一个单独的行,并通过separate_rows将其连接到自身。然后计算计数并完成计数。

members

给予:

library(dplyr)
library(tidyr)

DF %>%
  separate_rows(members) %>%
  distinct %>%
  inner_join(., ., by = "members") %>%
  count(name.x, name.y) %>%
  complete(name.x, name.y)

2)基本R 创建一个函数,该函数计算两个成员零部件之间的交点数。然后使用# A tibble: 16 x 3 name.x name.y n <chr> <chr> <int> 1 x1 x1 6 2 x1 x2 1 3 x1 x3 1 4 x1 x4 1 5 x2 x1 1 6 x2 x2 5 7 x2 x3 1 8 x2 x4 2 9 x3 x1 1 10 x3 x2 1 11 x3 x3 6 12 x3 x4 4 13 x4 x1 1 14 x4 x2 2 15 x4 x3 4 16 x4 x4 6 将其应用于每一对并转换为data.frame。

outer

给予:

Scan <- function(x) scan(text = x, what = "", sep = ",", quiet = TRUE)
countSame <- function(x, y) length(intersect(Scan(x), Scan(y)))
x <- setNames(DF$members, DF$name)
as.data.frame.table(outer(x, x, Vectorize(countSame)))

尽管上述问题要求使用data.frame表格,但您可能更喜欢2d表,只需从代码的最后一行省略 Var1 Var2 Freq 1 x1 x1 6 2 x2 x1 1 3 x3 x1 1 4 x4 x1 1 5 x1 x2 1 6 x2 x2 5 7 x3 x2 1 8 x4 x2 2 9 x1 x3 1 10 x2 x3 1 11 x3 x3 6 12 x4 x3 4 13 x1 x4 1 14 x2 x4 2 15 x3 x4 4 16 x4 x4 6 即可生成。

as.data.frame.table

2a)通过将strsplit应用于成员,然后使用外部计算对的交点的长度,可以形成(2)上只有两行的变体。最后,我们转换为数据帧。省略 x1 x2 x3 x4 x1 6 1 1 1 x2 1 5 1 2 x3 1 1 6 4 x4 1 2 4 6 可以再次形成2d表。)

as.data.frame.table

给予:

x <- with(DF, setNames(strsplit(members, ","), name))
as.data.frame.table(outer(x, x, Vectorize(function(x, y) length(intersect(x, y)))))

注意

   Var1 Var2 Freq
1    x1   x1    6
2    x2   x1    1
3    x3   x1    1
4    x4   x1    1
5    x1   x2    1
6    x2   x2    5
7    x3   x2    1
8    x4   x2    2
9    x1   x3    1
10   x2   x3    1
11   x3   x3    6
12   x4   x3    4
13   x1   x4    1
14   x2   x4    2
15   x3   x4    4
16   x4   x4    6

答案 1 :(得分:0)

我相信以下代码可以解决问题。但是请注意,我发现它很复杂,只有两个merge指令,也许其他人会找到一个更简单的解决方案。

fun <- function(DF){
  ex <- expand.grid(Var2 = DF[['name']], name = DF[['name']])[2:1]
  members <- as.character(DF[['members']])
  merge(DF, ex)
}

tmp <- merge(df1, fun(df1))
o <- order(tmp[[3]])
tmp$members2 <- tmp$members[o]

tmp$common <- apply(tmp[c(2, 4)], 1, function(x){
  y1 <- unlist(strsplit(as.character(x[1]), ","))
  y2 <- unlist(strsplit(as.character(x[2]), ","))
  length(intersect(y1, y2))
})

res <- tmp[c(1, 5, 3)]
names(res)[3] <- "name2"

head(res)
#  name common name2
#1   x1      6    x1
#2   x1      1    x2
#3   x1      1    x3
#4   x1      1    x4
#5   x2      1    x1
#6   x2      5    x2

最终清理。

rm(tmp)

数据。

df1 <- read.table(text = "
name           members
x1            A,B,N,K,Y,G
x2            J,L,M,N,T
x3            G,H,S,J,D,F
x4            J,K,H,F,H,D,L
", header = TRUE)