第三列中的值基于其他列的分组

时间:2019-08-07 13:08:06

标签: r

我需要根据此ID的现有值为a列中的每个ID设置一个标签。例如,如果id 1仅具有“ F”,则结果将为“女性”,如果只有“ M”,则结果为“男性”,如果混合,则结果为“混合”。

这是数据库的基础:

    df=data.frame(
      a=c(1,1,1,2,2,3,3,3,3,3),
      b=c("F","M","F","M","M","F","F","F","F","F"))

这是预期的结果:

    df$Result=c("Mixed", "Mixed", "Mixed", "Male", "Male", "Female", "Female", "Female", "Female", "Female")

       a b Result
    1  1 F  Mixed
    2  1 M  Mixed
    3  1 F  Mixed
    4  2 M   Male
    5  2 M   Male
    6  3 F Female
    7  3 F Female
    8  3 F Female
    9  3 F Female
    10 3 F Female

有人可以帮助我计算此df$Result列吗?预先感谢!

2 个答案:

答案 0 :(得分:2)

按“ a”分组后,检查“ b”中不同元素的数量。如果大于1,则返回“混合”,否则返回“ b”中更改后的标签

library(dplyr)
df %>%
     mutate(b1 = c("Male", "Female")[(b == "F") + 1]) %>%
     group_by(a) %>%
     mutate(Result = case_when(n_distinct(b) > 1 ~ "Mixed", TRUE  ~ b1)) %>%
     select(-b1)
# A tibble: 10 x 3
# Groups:   a [3]
#       a b     Result
#   <dbl> <chr> <chr> 
# 1     1 F     Mixed 
# 2     1 M     Mixed 
# 3     1 F     Mixed 
# 4     2 M     Male  
# 5     2 M     Male  
# 6     3 F     Female
# 7     3 F     Female
# 8     3 F     Female
# 9     3 F     Female
#10     3 F     Female

数据

df <- data.frame(
      a=c(1,1,1,2,2,3,3,3,3,3),
      b=c("F","M","F","M","M","F","F","F","F","F"),
      stringsAsFactors = FALSE)

答案 1 :(得分:2)

具有 data.table 的解决方案:

library(data.table)
a = c(1,1,1,2,2,3,3,3,3,3)
b = c("F","M","F","M","M","F","F","F","F","F")
df = data.table(a, b)

df[, result := as.character(uniqueN(b)), a]
df[, result := ifelse(result == "1", ifelse(b == "M", "Male", "Female"), "Mixed")]
df
#     a b result
#  1: 1 F  Mixed
#  2: 1 M  Mixed
#  3: 1 F  Mixed
#  4: 2 M   Male
#  5: 2 M   Male
#  6: 3 F Female
#  7: 3 F Female
#  8: 3 F Female
#  9: 3 F Female
# 10: 3 F Female