我有 2 个具有相同行和列结构的数据框,它们都有很多 NA 值。我想创建另一个数据框,它只是告诉我 2 个原始数据框中的哪些单元格实际上具有值。例如
到目前为止,我已经能够通过为每列更改一系列 if else 语句来手动执行此操作,如下所示:
combined <- trial_1[,1:2] %>%
mutate("Part1" = ifelse(!is.na(trial_1$Part1) & !is.na(trial_2$Part1), "1 & 2",
ifelse(!is.na(trial_1$Part1) & is.na(trial_2$Part1), "1 only", ifelse(is.na(trial_1$Part1) & !is.na(trial_2$Part1),
"2 only", ifelse(is.na(trial_1$Part1) & is.na(trial_2$Part1),
"NA", "Failed"))))) %>%
mutate("Part2" = ifelse(!is.na(trial_1$Part2) & !is.na(trial_2$Part2),
"1 & 2",ifelse(!is.na(trial_1$Part2) & is.na(trial_2$Part2), "1 only",
ifelse(is.na(trial_1$Part2) & !is.na(trial_2$Part2), "2 only", ifelse(is.na(trial_1$Part2) & is.na(trial_2$Part2), "NA", "Failed"))))) %>%
mutate("Part3" = ifelse(!is.na(trial_1$Part3) & !is.na(trial_2$Part3), "1 & 2",
ifelse(!is.na(trial_1$Part3) & is.na(trial_2$Part3),
"1 only", ifelse(is.na(trial_1$Part3) & !is.na(trial_2$Part3), "2 only", ifelse(is.na(trial_1$Part3) & is.na(trial_2$Part3),
"NA", "Failed"))))) %>%
mutate("Part4" = ifelse(!is.na(trial_1$Part4) & !is.na(trial_2$Part4),
"1 & 2", ifelse(!is.na(trial_1$Part4) & is.na(trial_2$Part4), "1 only", ifelse(is.na(trial_1$Part4) & !is.na(trial_2$Part4),
"2 only", ifelse(is.na(trial_1$Part4) & is.na(trial_2$Part4), "NA", "Failed")))))
但这显然效率不高,所以我尝试使用 for 循环,但不起作用:
participants <- list('Part1', 'Part2', 'Part3', 'Part4')
combined <- trial_1[,1:2]
for (i in participants) {
combined <- combined %>%
mutate(i = ifelse(!is.na(trial_1$i) & !is.na(trial_2$i), "1 & 2",
ifelse(!is.na(trial_1$i) & is.na(trial_2$i), "1 only",
ifelse(is.na(trial_1$i) & !is.na(trial_2$i), "2 only",
ifelse(is.na(trial_1$i) & is.na(trial_2$i), "NA", "Failed")))))
}
任何有关如何重构此 for 循环的帮助(我认为这是可行的方法)都会非常有帮助。谢谢!
答案 0 :(得分:1)
可以尝试使用 tidyverse
。首先,基于 number
和 status
,通过连接将两个数据框合并在一起。如果您愿意,可以在此处指明试用编号。
然后,您可以将数据放入长格式,并单独查看 Part
中的每个元素。使用 mutate
创建一个新字符串,基于哪些试验具有非缺失值。
最后,使用 pivot_wider
将数据放入宽格式。
library(tidyverse)
trial_1 %>%
left_join(trial_2, by = c("number", "status"), suffix = c(".t1", ".t2")) %>%
pivot_longer(cols = starts_with("Part"), names_to = c("Part", ".value"), names_pattern = "Part(\\d+).(t[1-9])") %>%
mutate(part_string = case_when(
!is.na(t1) & !is.na(t2) ~ "1 & 2",
!is.na(t1) ~ "1 only",
!is.na(t2) ~ "2 only",
TRUE ~ NA_character_
)) %>%
pivot_wider(id_cols = c(number, status), names_from = "Part", values_from = "part_string", names_prefix = "Part")
输出
number status Part1 Part2 Part3 Part4
<int> <chr> <chr> <chr> <chr> <chr>
1 1 very low 1 only NA 2 only NA
2 2 low NA 1 only 1 & 2 NA
3 3 medium 2 only NA 1 only NA
4 4 high NA NA NA NA
5 5 very high NA NA 1 only 1 & 2