我正在尝试基于满足其他两个变量lab_conf
和diagnosis
的任一条件来创建新变量PC_R
。这是我正在使用的代码:
mutate(lab_conf = ifelse( (diagnosis == "confirmed")|(PC_R == "pos"), "pos", "neg"))
我得到的输出显示NA,它应该显示“ neg”,所以我只得到2个值; “ pos”或“ NA”。我希望根据指定的条件,将此新变量的值设置为“ pos”,“ neg”或NA,如果在两个条件下均为NA,则NA为NA。
这是我从dput(head(x))
得到的:
structure(list(diagnosis = structure(c(16L, 16L, 16L, 3L, 16L,
3L), .Label = c("*un-confirmed", "Cloted sample", "confirmed",
"Hemolysed sampl", "inadequate sample", "rej (sample leaking)",
"rej(Hemolyzed sample)", "rej(Hemolyzed)", "rej: sample Hemolyzed",
"rej: sample leaking", "rej: sample leaking + Hemolyzed", "rej: sample leaking+not convnient tube",
"repeat sample", "tf", "TF", "un-confirmed"), class = "factor"),
PC_R = structure(c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = c("clotted",
"hemolyzed", "neg", "not pos", "Not REQUIred", "OTHER", "pos",
"QNS", "rej", "repeat sample", "Sample broken", "tf", "TF"
), class = "factor"), lab_conf = c(NA, NA, NA, "pos", NA,
"pos")), .Names = c("diagnosis", "PC_R", "lab_conf"), row.names = c(NA,
6L), class = "data.frame")
答案 0 :(得分:0)
通常,当您提供样本数据时,您想要提供所有可能的结果。您提供的示例数据都是相同的。
我为您创建了一些示例数据,我认为这是您要使用的数据,然后执行该操作。
library(dplyr)
temp2 <- structure(list(diagnosis = c("unconfirmed", "unconfirmed", "unconfirmed", "confirmed", "confirmed", "confirmed"), PC_R = c("pos", "neg",NA, "pos", "neg", NA)), row.names = c(NA, -6L), class = "data.frame")
temp2 %>% mutate(lab_conf = ifelse(diagnosis == "confirmed" | PC_R == "pos", "pos", "neg"))
diagnosis PC_R lab_conf
1 unconfirmed pos pos
2 unconfirmed neg neg
3 unconfirmed <NA> <NA>
4 confirmed pos pos
5 confirmed neg pos
6 confirmed <NA> pos
答案 1 :(得分:0)
使用%in%
代替==
,就像这样:
df = df %>%
mutate(lab_conf = ifelse( (diagnosis %in% "confirmed") | (PC_R %in% "pos"), "pos", "neg"))
您遇到的问题是,如果其中一个操作数为==
,则NA
运算符将返回NA
。同样,NA | FALSE
返回NA
。这两个事实是为什么您的OR语句将评估结果为NA
的原因,这导致您的ifelse评估结果为NA
。
ifelse
语句被设置为如果该语句的值为TRUE,则返回“ pos”;如果该语句的值为FALSE,则返回“ neg”,但是如果该语句的值为NA,则ifelse不返回任何内容。这就是为什么您要获得NA。
使用%in%
可以解决这个问题。