我正在处理一个像这样的数据框:
groups . values
a . 1
a . 1
a 2
b . 2
b . 3
b . 3
c . 4
c . 5
c . 6
d . 6
d . 7
d . 2
问题是将其变成类似以下内容的
groups . values
a . 1
a . 1
b . 3
b . 3
c . 4
c . 5
d . 7
我想保留其值仅出现在一个组中的行。例如,值2被删除,因为它出现在三个不同的组中,但是值1被保留,尽管它在“仅一个”组中出现了两次。
dplyr软件包中是否有任何功能可以解决此问题?还是我必须编写自己的函数?
答案 0 :(得分:1)
您要求的dplyr
解决方案:
df %>% group_by(values) %>% filter(n_distinct(groups) == 1)
# # A tibble: 7 x 2
# # Groups: values [5]
# groups values
# <chr> <int>
#1 a 1
#2 a 1
#3 b 3
#4 b 3
#5 c 4
#6 c 5
#7 d 7
使用
df <- structure(list(groups = c("a", "a", "a", "b", "b", "b", "c", "c", "c", "d", "d", "d"),
values = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 5L, 6L, 6L, 7L, 2L)),
row.names = c(NA, -12L), class = "data.frame")
答案 1 :(得分:0)
按print_list = [ str(i) for i in range(1, 4) ]
print(" - ".join(print_list))
分组,查看列values
是否只有一个元素。可以通过groups
完成。
ave
i <- as.logical(with(df1, ave(as.numeric(groups), values, FUN = function(x) length(unique(x)) == 1)))
df1[i, ]
# groups values
#1 a 1
#2 a 1
#5 b 3
#6 b 3
#7 c 4
#8 c 5
#11 d 7
格式的数据。
dput
答案 2 :(得分:0)
x[x$values %in% names(which(colSums(table(x)>0)==1)),]
其中
x = structure(list(groups = c("a", "a", "a", "b", "b", "b", "c",
"c", "c", "d", "d", "d"), values = c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 5L, 6L, 6L, 7L, 2L)), row.names = c(NA, -12L), class = "data.frame")
或者,一个data.table
解决方案:
setDT(x)[, .SD[uniqueN(groups)==1], values]
答案 3 :(得分:0)
将sqldf
包用于原始数据帧df
:
library(sqldf)
result <- sqldf("SELECT * FROM df
WHERE `values` IN (
SELECT `values` from (
SELECT `values`, groups, count(*) as num from df
GROUP BY `values`, groups) t
GROUP BY `values`
HAVING COUNT(1) = 1
)")