我有癌症患者的数据集和不同的结果
TypeofOutcome DateStageIV
NA 01.04.2014
Died from melanoma 01.06.2011
Died from melanoma 01.11.2013
我想要一个名为“结果”的新列,所有患者仍然活着编码为1,所有死亡编码为0。 从上一个练习中我创建了一个代码:
mergedData$Outcome <- 1* (mergedData$TypeofOutcome = c ("Alive with stable disease", "Alive with progressive disease", "Alive with complete response"))
我已经假设这不起作用,我收到了错误消息:
1 *中的错误(mergedData $ TypeofOutcome = c(“Alive with stable disease”,:
二元运算符的非数字参数
我确信我的问题有一个简单的解决方案。
答案 0 :(得分:0)
如果我理解你正确,你想创建一个依赖于字符串变量值的二分变量,例如:如果TypeOfOutcome
匹配任何&#34; Alive with stable disease&#34;,& #34;患有进行性疾病&#34;或者&#34;活着完成响应&#34;,Outcome
将为1,否则为0.我假设您的数据集看起来与此类似
mergedData <- data.frame(
TypeOfOutcome = c("Alive with stable disease", "Alive with progressive disease", "Alive with complete response", NA, "Died from melanoma"),
DateStageIV = sample(seq(as.Date('2011/01/01'), as.Date('2015/01/01'), by="day"), 5))
# TypeOfOutcome DateStageIV
# 1 Alive with stable disease 2013-05-09
# 2 Alive with progressive disease 2014-08-08
# 3 Alive with complete response 2013-02-10
# 4 <NA> 2014-05-23
# 5 Died from melanoma 2012-08-08
函数ifelse
适用于重新编码,基本语法为:
ifelse(test, yes, no)
如果test
中的语句为真,则返回yes
的值,否则返回no
的值。在这种情况下,test
是患者仍然活着的所有情况,由TypeofOutcome
中的字符串表示#34;活着患有稳定的疾病&#34;,&#34; Alive with progressive疾病&#34;或者&#34;活着并完成回复&#34;。对此的测试将是:
test <- mergedData$TypeOfOutcome %in% c("Alive with stable disease", "Alive with progressive disease", "Alive with complete response")
如果test
中的值与TRUE
运算符后的任何情况匹配,则 TypeOfOutcome
为%in%
。 yes
将为1,no
将为0.创建新变量
mergedData$Outcome <- ifelse(test, 1, 0)
mergedData
# TypeOfOutcome DateStageIV Outcome
# 1 Alive with stable disease 2013-05-09 1
# 2 Alive with progressive disease 2014-08-08 1
# 3 Alive with complete response 2013-02-10 1
# 4 <NA> 2014-05-23 0
# 5 Died from melanoma 2012-08-08 0