我从以下数据开始:
df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"), Day=c(1,2,2,1,1,2), Fruit=c("Apple","X","Apple","X","X","Orange"))
Person Day Fruit
1 Ada 1 Apple
2 Ada 2 X
3 Bob 2 Apple
4 Bob 1 X
5 Carl 1 X
6 Carl 2 Orange
我希望循环遍历每个人并用X
或Apple
替换未知水果Orange
,同时确保如果有Orange
天,它应该第二天是Apple
,反之亦然。
Ada:Day 1 = Apple
,意为Day 2 = X <- Orange
我不知道从哪里开始:
library(dplyr)
df %>%
group_by(Person)
答案 0 :(得分:1)
使用case_when
中的dplyr
的另一种解决方案:
library(dplyr)
# Changing datatypes to character instead of factor
df[] <- lapply(df, as.character)
# Optional, but this line will convert all columns to appropriate datatype, eg. Day will be integer
df <- readr::type_convert(df)
df %>%
group_by(Person) %>%
mutate(
Contains_Apple = any(Fruit == "Apple"),
Contains_Orange = any(Fruit == "Orange"),
Fruit = case_when(
Fruit == "X" & Contains_Apple == F ~ "Apple",
Fruit == "X" & Contains_Orange == F ~ "Orange",
TRUE ~ Fruit
)
)
# A tibble: 6 x 5
# Groups: Person [3]
Person Day Fruit Contains_Apple Contains_Orange
<chr> <int> <chr> <lgl> <lgl>
1 Ada 1 Apple T F
2 Ada 2 Orange T F
3 Bob 2 Apple T F
4 Bob 1 Orange T F
5 Carl 1 Apple F T
6 Carl 2 Orange F T
删除Contains_Apple
和Contains_Orange
:
df %>%
group_by(Person) %>%
mutate(Contains_Apple = any(Fruit == "Apple"),
Contains_Orange = any(Fruit == "Orange"),
Fruit = case_when(Fruit == "X" & Contains_Apple == F ~ "Apple",
Fruit == "X" & Contains_Orange == F ~ "Orange",
TRUE ~ Fruit)) %>%
select(Person, Day, Fruit) %>%
ungroup()
# A tibble: 6 x 3
Person Day Fruit
<chr> <int> <chr>
1 Ada 1 Apple
2 Ada 2 Orange
3 Bob 2 Apple
4 Bob 1 Orange
5 Carl 1 Apple
6 Carl 2 Orange
答案 1 :(得分:0)
这是一个想法,使用case_when
来检查每个组是否已经有“Apple”或“Orange”,然后如果Fruit为“X”则分配相反的值。
请注意,我在创建示例数据框时添加了stringsAsFactors = FALSE
,旨在避免创建因子列。
library(dplyr)
library(tidyr)
df %>%
group_by(Person) %>%
mutate(Fruit = case_when(
Fruit %in% "X" & any(Fruit %in% "Apple") ~ "Orange",
Fruit %in% "X" & any(Fruit %in% "Orange") ~ "Apple",
TRUE ~ Fruit
)) %>%
ungroup()
# # A tibble: 6 x 3
# Person Day Fruit
# <chr> <dbl> <chr>
# 1 Ada 1.00 Apple
# 2 Ada 2.00 Orange
# 3 Bob 2.00 Apple
# 4 Bob 1.00 Orange
# 5 Carl 1.00 Apple
# 6 Carl 2.00 Orange
数据强>
df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"),
Day=c(1,2,2,1,1,2),
Fruit=c("Apple","X","Apple","X","X","Orange"),
stringsAsFactors = FALSE)
答案 2 :(得分:0)