这里是一些示例数据
sample = data.frame("col1" = c("val1", "val1", "val1", "val1", "val2", "val2", "val2", "val3", "val3", "val3", "val3"),
"col2" = c("this", "that", "some", "thing", "thing", "that", "some", "diff", "some", "this", "that"))
我想确定列col2的每个条目,它出现在列col1的每个唯一值中。这有点可能吗? 这将是样本数据的结果:
result = c("that", "some")
提前致谢。
答案 0 :(得分:1)
base R
中的(快速且肮脏)解决方案:
sample_list <- split(sample, sample$col1)
for (i in 1:length(sample_list)) sample_list[[i]] <- sample_list[[i]]$col2
Reduce(intersect, sample_list)
[1] "that" "some"
编辑:
data.table
解决方案受到马特dplyr
回答的启发:
library(data.table)
setDT(sample)
n <- uniqueN(sample$col1)
sample[, .N, by = .(col1, col2)][, .N, by = col2][N == n, col2]
[1] that some
此解决方案将在大型数据集上快速完成。
编辑2:
使用dcast
中提供的data.table
:
present_in <- colSums(!is.na(dcast(sample, col1 ~ col2, value.var = "col2")))
names(present_in)[present_in == 3][-1]
[1] "some" "that"
答案 1 :(得分:1)
这里有一点关于使用dplyr
的方式。
require(dplyr)
sets <- length(unique(sample$col1))
s <- sample %>%
group_by(col2) %>%
summarise(n = n_distinct()) %>%
filter(n == sets)
result <- s$col2
[1] some that
答案 2 :(得分:1)
这是使用dplyr执行此操作的一种方法:
split(sample,sample$col1)%>%
Reduce(function(dtf1,dtf2) inner_join(dtf1,dtf2,by="col2"), .)%>%select(col2)%>%print()
col2
1 that
2 some
答案 3 :(得分:1)
您需要的是intersect
。这是一种快速而又肮脏的方式:
<强> CODE 强>
library(data.table)
dt <- as.data.table(sample)
# Split data.table into different chunks based on unique values in col1
# output is a list where each entry is a data.table
l <- split(dt, by = "col1")
# Find the intersection of all values in col2
Reduce(intersect, lapply(1:length(l), function(z) as.character(l[[z]]$col2)))
<强>输出强>
> Reduce(intersect, lapply(1:length(l), function(z) as.character(l[[z]]$col2)))
[1] "that" "some"
答案 4 :(得分:0)
另一个脏基R
解决方案:
names(which(table(unlist(aggregate(sample$col2, list(sample$col1), unique)[, 2])) == length(unique(sample$col1))))
[1] "some" "that"