根据条件变量在列中的差异创建数据框

时间:2019-12-11 14:01:22

标签: r

我有 df1

ID  Name  Score  Category
100 AA    1: M   1
100 BB    2: M   1
200 CC    3: M   1
200 DD    2: M   2
300 EE    4: L   1
300 FF    NA     1
400 GG    1: M   3
400 HH    1: M   3

我需要2个输出dfs- df2 ,其中仅包含每个ID的“得分”与“类别”不同的行(如ID 100和300)和 df3 仅在每个ID的“类别”之间的“得分”不同的行中(如ID200。

我包含一个NA,因为在这种情况下,NA也将被视为得分,这意味着ID 300包含WITHIN差异。

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:3)

我们可以通过'ID','Category'和filter对具有不同的“得分”数量大于1的组进行分组

library(dplyr)
df1 %>%
    group_by(ID, Category) %>% 
    filter(n_distinct(Score) > 1)
# A tibble: 4 x 4
# Groups:   ID, Category [2]
#     ID Name  Score Category
#  <int> <chr> <chr>    <int>
#1   100 AA    1: M         1
#2   100 BB    2: M         1
#3   300 EE    4: L         1
#4   300 FF    <NA>         1

或者在第二种情况下

df1 %>%
    group_by(ID) %>%
    filter(n_distinct(Category) > 1 & n_distinct(Score) > 1)
# A tibble: 2 x 4
# Groups:   ID [1]
#    ID Name  Score Category
#  <int> <chr> <chr>    <int>
#1   200 CC    3: M         1
#2   200 DD    2: M         2

可以使用map2

在一个呼叫中完成
library(purrr)
map2(list(c("ID", "Category"), "ID"),
     list("Score", c("Category", "Score")),
      ~ df1 %>%
           group_by_at(.x) %>%
            filter_at(vars(.y), all_vars(n_distinct(.) > 1)))
#[[1]]
# A tibble: 4 x 4
# Groups:   ID, Category [2]
#     ID Name  Score Category
#  <int> <chr> <chr>    <int>
#1   100 AA    1: M         1
#2   100 BB    2: M         1
#3   300 EE    4: L         1
#4   300 FF    <NA>         1

#[[2]]
# A tibble: 2 x 4
# Groups:   ID [1]
#     ID Name  Score Category
#  <int> <chr> <chr>    <int>
#1   200 CC    3: M         1
#2   200 DD    2: M         2

数据

df1 <- structure(list(ID = c(100L, 100L, 200L, 200L, 300L, 300L, 400L, 
400L), Name = c("AA", "BB", "CC", "DD", "EE", "FF", "GG", "HH"
), Score = c("1: M", "2: M", "3: M", "2: M", "4: L", NA, "1: M", 
"1: M"), Category = c(1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L)), 
 class = "data.frame", row.names = c(NA, 
-8L))

答案 1 :(得分:2)

这是基本的R解决方案,

i1 <- !!with(df, ave(Category, ID, FUN = function(i) length(unique(i)) != 1))
i2 <- with(df, ave(Score, ID, FUN = function(i) length(unique(i)) != 1)) == 'TRUE'

#data frame 1
df[i1,]
#   ID Name Score Category
#3 200   CC   3_M        1
#4 200   DD   2_M        2

#Data frame 2

df[i1+i2 == 1,]
#   ID Name Score Category
#1 100   AA   1_M        1
#2 100   BB   2_M        1
#5 300   EE   4_L        1
#6 300   FF  <NA>        1