我有 df1 :
ID Name Score Category
100 AA 1: M 1
100 BB 2: M 1
200 CC 3: M 1
200 DD 2: M 2
300 EE 4: L 1
300 FF NA 1
400 GG 1: M 3
400 HH 1: M 3
我需要2个输出dfs- df2 ,其中仅包含每个ID的“得分”与“类别”不同的行(如ID 100和300)和 df3 仅在每个ID的“类别”之间的“得分”不同的行中(如ID200。
我包含一个NA,因为在这种情况下,NA也将被视为得分,这意味着ID 300包含WITHIN差异。
任何帮助将不胜感激。
答案 0 :(得分:3)
我们可以通过'ID','Category'和filter
对具有不同的“得分”数量大于1的组进行分组
library(dplyr)
df1 %>%
group_by(ID, Category) %>%
filter(n_distinct(Score) > 1)
# A tibble: 4 x 4
# Groups: ID, Category [2]
# ID Name Score Category
# <int> <chr> <chr> <int>
#1 100 AA 1: M 1
#2 100 BB 2: M 1
#3 300 EE 4: L 1
#4 300 FF <NA> 1
或者在第二种情况下
df1 %>%
group_by(ID) %>%
filter(n_distinct(Category) > 1 & n_distinct(Score) > 1)
# A tibble: 2 x 4
# Groups: ID [1]
# ID Name Score Category
# <int> <chr> <chr> <int>
#1 200 CC 3: M 1
#2 200 DD 2: M 2
可以使用map2
library(purrr)
map2(list(c("ID", "Category"), "ID"),
list("Score", c("Category", "Score")),
~ df1 %>%
group_by_at(.x) %>%
filter_at(vars(.y), all_vars(n_distinct(.) > 1)))
#[[1]]
# A tibble: 4 x 4
# Groups: ID, Category [2]
# ID Name Score Category
# <int> <chr> <chr> <int>
#1 100 AA 1: M 1
#2 100 BB 2: M 1
#3 300 EE 4: L 1
#4 300 FF <NA> 1
#[[2]]
# A tibble: 2 x 4
# Groups: ID [1]
# ID Name Score Category
# <int> <chr> <chr> <int>
#1 200 CC 3: M 1
#2 200 DD 2: M 2
df1 <- structure(list(ID = c(100L, 100L, 200L, 200L, 300L, 300L, 400L,
400L), Name = c("AA", "BB", "CC", "DD", "EE", "FF", "GG", "HH"
), Score = c("1: M", "2: M", "3: M", "2: M", "4: L", NA, "1: M",
"1: M"), Category = c(1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L)),
class = "data.frame", row.names = c(NA,
-8L))
答案 1 :(得分:2)
这是基本的R解决方案,
i1 <- !!with(df, ave(Category, ID, FUN = function(i) length(unique(i)) != 1))
i2 <- with(df, ave(Score, ID, FUN = function(i) length(unique(i)) != 1)) == 'TRUE'
#data frame 1
df[i1,]
# ID Name Score Category
#3 200 CC 3_M 1
#4 200 DD 2_M 2
#Data frame 2
df[i1+i2 == 1,]
# ID Name Score Category
#1 100 AA 1_M 1
#2 100 BB 2_M 1
#5 300 EE 4_L 1
#6 300 FF <NA> 1