R中分组数据的条件格式

时间:2018-06-25 14:21:24

标签: r if-statement group-by dplyr mutate

UPRN    Start.Date  End.Date  Disability
1       2006-12-20 17-NOV-17         Y
1       2006-12-20 17-NOV-17         N
2       1991-12-06                   N
2       1991-12-06                   N
3       1991-04-29 2015-04-21        N
3       2015-04-22                   Y
4       2005-02-15                   Y
4       2005-02-15                   N

我有一个类似上面的数据集(但更大)。我想创建一个名为Any_Disability的新列。

我要执行的方法是按UPRN,Start.Date和End.Date进行分组,如果该组中的任何一行都具有残障,则两行的Any_Disability都将为“ Y”。

我已经尝试过的是:

library(dplyr)

test3<-all_data%>%
  group_by(UPRN, Start.Date, End.Date)%>%
  mutate(Any_Disability = ifelse(Disability=="Y", "Y","N"))

但这不起作用,因为它给出以下答案:

UPRN    Start.Date  End.Date  Disability  Any_Disability
1       2006-12-20 17-NOV-17         Y          Y
1       2006-12-20 17-NOV-17         N          N
2       1991-12-06                   N          N
2       1991-12-06                   N          N
3       1991-04-29 2015-04-21        N          N
3       2015-04-22                   Y          Y
4       2005-02-15                   Y          Y
4       2005-02-15                   N          N

可复制的代码:

UPRN<-c(1,1,2,2,3,3,4,4)
Start.Date<-c("2006-12-20","2006-12-20", "1991-12-06","1991-12-06","1991-04-29", "2015-04-22","2005-02-15", "2005-02-15")
End.Date<-c("17-NOV-17", "17-NOV-17", "","", "2015-04-21", "", "", "")
Disability<-c("Y","N","N","N","N","Y","Y","N")

dataset <- data.frame(UPRN, Start.Date, End.Date, Disability)

2 个答案:

答案 0 :(得分:3)

我们可以使用any

test3<-df%>%
     group_by(UPRN, Start.Date)%>%
     dplyr::mutate(Any_Disability = ifelse(any(Disability=="Y"), "Y","N"))
test3
# A tibble: 8 x 5
# Groups:   UPRN, Start.Date [5]
   UPRN Start.Date   End.Date Disability Any_Disability
  <int>      <chr>      <chr>      <chr>          <chr>
1     1 2006-12-20  17-NOV-17          Y              Y
2     1 2006-12-20  17-NOV-17          N              Y
3     2 1991-12-06       <NA>          N              N
4     2 1991-12-06       <NA>          N              N
5     3 1991-04-29 2015-04-21          N              N
6     3 2015-04-22       <NA>          Y              Y
7     4 2005-02-15       <NA>          Y              Y
8     4 2005-02-15       <NA>          N              Y

答案 1 :(得分:0)

使用ave'Y' > 'N' ... TRUE

的基本R
dataset_new <- within(dataset, 
    Any_Disability <- ave(Disability, paste0(UPRN, "#", Start.Date, "#", End.Date), 
        FUN = max)
    )

# UPRN Start.Date   End.Date Disability Any_Disability
#    1 2006-12-20  17-NOV-17          Y              Y
#    1 2006-12-20  17-NOV-17          N              Y
#    2 1991-12-06                     N              N
#    2 1991-12-06                     N              N
#    3 1991-04-29 2015-04-21          N              N
#    3 2015-04-22                     Y              Y
#    4 2005-02-15                     Y              Y
#    4 2005-02-15                     N              Y

对于相同的结果,您可以写FUN = function(x){if (any('Y' %in% x)) 'Y' else 'N'},而不使用'Y'>'N'技巧。