根据条件更改分组中的值

时间:2018-01-13 20:28:58

标签: r dataframe dplyr

我从以下数据开始:

df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"), Day=c(1,2,2,1,1,2), Fruit=c("Apple","X","Apple","X","X","Orange"))

  Person Day  Fruit
1    Ada   1  Apple
2    Ada   2      X
3    Bob   2  Apple
4    Bob   1      X
5   Carl   1      X
6   Carl   2 Orange

我希望循环遍历每个人并用XApple替换未知水果Orange,同时确保如果有Orange天,它应该第二天是Apple,反之亦然。

Ada:Day 1 = Apple,意为Day 2 = X <- Orange

我不知道从哪里开始:

library(dplyr)
df %>%
      group_by(Person)
  • 对方向的任何建议?

3 个答案:

答案 0 :(得分:1)

使用case_when中的dplyr的另一种解决方案:

library(dplyr)

# Changing datatypes to character instead of factor
df[] <- lapply(df, as.character)

# Optional, but this line will convert all columns to appropriate datatype, eg. Day will be integer
df <- readr::type_convert(df)

df %>%
  group_by(Person) %>%
  mutate(
    Contains_Apple = any(Fruit == "Apple"),
    Contains_Orange = any(Fruit == "Orange"),
    Fruit = case_when(
      Fruit == "X" & Contains_Apple == F ~ "Apple",
      Fruit == "X" & Contains_Orange == F ~ "Orange",
      TRUE ~ Fruit
    )
  )

# A tibble: 6 x 5
# Groups: Person [3]
  Person   Day Fruit  Contains_Apple Contains_Orange
  <chr>  <int> <chr>  <lgl>          <lgl>          
1 Ada        1 Apple  T              F              
2 Ada        2 Orange T              F              
3 Bob        2 Apple  T              F              
4 Bob        1 Orange T              F              
5 Carl       1 Apple  F              T              
6 Carl       2 Orange F              T    

删除Contains_AppleContains_Orange

df %>% 
  group_by(Person) %>% 
  mutate(Contains_Apple = any(Fruit == "Apple"),
         Contains_Orange = any(Fruit == "Orange"),
         Fruit = case_when(Fruit == "X" & Contains_Apple == F ~ "Apple",
                           Fruit == "X" & Contains_Orange == F ~ "Orange",
                           TRUE ~ Fruit)) %>% 
  select(Person, Day, Fruit) %>% 
  ungroup()

# A tibble: 6 x 3
  Person   Day Fruit 
  <chr>  <int> <chr> 
1 Ada        1 Apple 
2 Ada        2 Orange
3 Bob        2 Apple 
4 Bob        1 Orange
5 Carl       1 Apple 
6 Carl       2 Orange

答案 1 :(得分:0)

这是一个想法,使用case_when来检查每个组是否已经有“Apple”或“Orange”,然后如果Fruit为“X”则分配相反的值。

请注意,我在创建示例数据框时添加了stringsAsFactors = FALSE,旨在避免创建因子列。

library(dplyr)
library(tidyr)

df %>%
  group_by(Person) %>%
  mutate(Fruit = case_when(
    Fruit %in% "X" & any(Fruit %in% "Apple")  ~ "Orange",
    Fruit %in% "X" & any(Fruit %in% "Orange") ~ "Apple",
    TRUE                                      ~ Fruit
  )) %>%
  ungroup()   

# # A tibble: 6 x 3
#   Person   Day Fruit 
#   <chr>  <dbl> <chr> 
# 1 Ada     1.00 Apple 
# 2 Ada     2.00 Orange
# 3 Bob     2.00 Apple 
# 4 Bob     1.00 Orange
# 5 Carl    1.00 Apple 
# 6 Carl    2.00 Orange

数据

df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"), 
                 Day=c(1,2,2,1,1,2), 
                 Fruit=c("Apple","X","Apple","X","X","Orange"),
                 stringsAsFactors = FALSE)

答案 2 :(得分:0)

循环简单:

fruity_loop <- function(frame) { 
    ops <- c('Apple', 'Orange')
    for(x in 1:nrow(frame)) {
    if(frame[x,]['Fruit'] == 'X') { 
      if(frame[x-1,]['Fruit'] == ops[1]) { frame[x,]['Fruit'] <- ops[2] } else { frame[x,]['Fruit'] <- ops[1] } } 
    }
    return(frame)
}

示例:

fruity_loop(df)

enter image description here