我想找到数据帧行中所有公共元素的数量
name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L
name common name
x1 6 x1
x1 2 x2
x1 - x3
x1 - x4
x2 - x1
x2 5 - x2
x2 - x3
x2 - x4
x3 - x1
x3 - x2
x3 6 - x3
x3 - x4
x4 - x1
x4 - x2
x4 - x3
x4 7 -x4
答案 0 :(得分:1)
1)dplyr / tidyr 对于每一行,使用app.get('/createSection', isLoggedIn, (req, res, next)=>{
var query = {user:req.user._id};
dashModel.find(query,(err, data)=>{
if(err){
console.log(err)
}
console.log(data)
res.send(data)
})
});
为每个成员创建一个单独的行,并通过separate_rows
将其连接到自身。然后计算计数并完成计数。
members
给予:
library(dplyr)
library(tidyr)
DF %>%
separate_rows(members) %>%
distinct %>%
inner_join(., ., by = "members") %>%
count(name.x, name.y) %>%
complete(name.x, name.y)
2)基本R 创建一个函数,该函数计算两个成员零部件之间的交点数。然后使用# A tibble: 16 x 3
name.x name.y n
<chr> <chr> <int>
1 x1 x1 6
2 x1 x2 1
3 x1 x3 1
4 x1 x4 1
5 x2 x1 1
6 x2 x2 5
7 x2 x3 1
8 x2 x4 2
9 x3 x1 1
10 x3 x2 1
11 x3 x3 6
12 x3 x4 4
13 x4 x1 1
14 x4 x2 2
15 x4 x3 4
16 x4 x4 6
将其应用于每一对并转换为data.frame。
outer
给予:
Scan <- function(x) scan(text = x, what = "", sep = ",", quiet = TRUE)
countSame <- function(x, y) length(intersect(Scan(x), Scan(y)))
x <- setNames(DF$members, DF$name)
as.data.frame.table(outer(x, x, Vectorize(countSame)))
尽管上述问题要求使用data.frame表格,但您可能更喜欢2d表,只需从代码的最后一行省略 Var1 Var2 Freq
1 x1 x1 6
2 x2 x1 1
3 x3 x1 1
4 x4 x1 1
5 x1 x2 1
6 x2 x2 5
7 x3 x2 1
8 x4 x2 2
9 x1 x3 1
10 x2 x3 1
11 x3 x3 6
12 x4 x3 4
13 x1 x4 1
14 x2 x4 2
15 x3 x4 4
16 x4 x4 6
即可生成。
as.data.frame.table
2a)通过将strsplit应用于成员,然后使用外部计算对的交点的长度,可以形成(2)上只有两行的变体。最后,我们转换为数据帧。省略 x1 x2 x3 x4
x1 6 1 1 1
x2 1 5 1 2
x3 1 1 6 4
x4 1 2 4 6
可以再次形成2d表。)
as.data.frame.table
给予:
x <- with(DF, setNames(strsplit(members, ","), name))
as.data.frame.table(outer(x, x, Vectorize(function(x, y) length(intersect(x, y)))))
Var1 Var2 Freq
1 x1 x1 6
2 x2 x1 1
3 x3 x1 1
4 x4 x1 1
5 x1 x2 1
6 x2 x2 5
7 x3 x2 1
8 x4 x2 2
9 x1 x3 1
10 x2 x3 1
11 x3 x3 6
12 x4 x3 4
13 x1 x4 1
14 x2 x4 2
15 x3 x4 4
16 x4 x4 6
答案 1 :(得分:0)
我相信以下代码可以解决问题。但是请注意,我发现它很复杂,只有两个merge
指令,也许其他人会找到一个更简单的解决方案。
fun <- function(DF){
ex <- expand.grid(Var2 = DF[['name']], name = DF[['name']])[2:1]
members <- as.character(DF[['members']])
merge(DF, ex)
}
tmp <- merge(df1, fun(df1))
o <- order(tmp[[3]])
tmp$members2 <- tmp$members[o]
tmp$common <- apply(tmp[c(2, 4)], 1, function(x){
y1 <- unlist(strsplit(as.character(x[1]), ","))
y2 <- unlist(strsplit(as.character(x[2]), ","))
length(intersect(y1, y2))
})
res <- tmp[c(1, 5, 3)]
names(res)[3] <- "name2"
head(res)
# name common name2
#1 x1 6 x1
#2 x1 1 x2
#3 x1 1 x3
#4 x1 1 x4
#5 x2 1 x1
#6 x2 5 x2
最终清理。
rm(tmp)
数据。
df1 <- read.table(text = "
name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L
", header = TRUE)