我创建了这个代表性数据框,使用for循环分配条件类别。
df <- data.frame(Date=c("08/29/2011", "08/29/2011", "08/30/2011", "08/30/2011", "08/30/2011", "08/29/2012", "08/29/2012", "01/15/2012", "08/29/2012"),
Time=c("09:45", "10:00", "13:00", "13:30", "10:14", "9:09", "11:23", "17:06", "12:20"),
Diff = c(0.2,4.3,6.5,15.0, 16.5, 31, 30.2, 21.9, 1.9))
df1<- df %>%
mutate(Accuracy=ifelse(Diff<=3, "Excellent", "TBD"))
for(i in 1:nrow(df1)){
if(df1$Diff[i]>3&&df1$Diff[i]<=10){
df1$Accuracy[i]<-"Good"}
if(df1$Diff[i]>10&&df1$Diff[i]<=15){
df1$Accuracy[i]<-"Fair"}
if(df1$Diff[i]>15&&df1$Diff[i]<=30){
df1$Accuracy[i]<-"Poor"}
if(df1$Diff[i]>30){
df1$Accuracy[i]<-"Unacceptable"}
}
我的实际数据集非常大,并且读取指示for循环通常不是在R中编码的最有效方式。我相信我可以通过为每个条件创建逻辑向量来做同样的事情,并且在每个向量内TRUE是满足每个条件。然后,我可以通过子集分配值,df1 $ Accuracy [Good]&lt; - &#34; Good&#34;例如。但是,我无法弄清楚如何使用apply族函数或dplyr函数创建逻辑向量。 (但是,任何避免for循环的解决方案也是受欢迎的。)如果for循环是更好的方法,那么知道这也会有所帮助。
这是我失败的尝试。这些返回不正确的NA或不正确的逻辑向量。我不理解的很多事情之一是lapply知道如何通过列或行。
Good<-apply(df1, 1, function(x) ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE)) #logical, TRUE where condition is true
Good<-unlist(lapply(df1$Diff, function(x) {(ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE))}))
更新:嵌套的ifelse语句可以使用,但仍然欢迎任何有关如何使用apply的建议。
mutate(Accuracy=ifelse(pDiff<=3, "Excellent",
ifelse(pDiff>3&pDiff<=10, "Good",
ifelse(pDiff>10&pDiff<=15, "Fair",
ifelse(pDiff>15&pDiff<30, "Poor",
ifelse(Diff>30, "Unpublishable", "TBD"))))))
答案 0 :(得分:2)
您可以使用case_when
中的dplyr
:
df1<- df %>%
mutate(Accuracy= case_when(
.$Diff <= 3 ~ "Excellent",
.$Diff <= 10 ~ "Good",
.$Diff <= 15 ~ "Fair",
.$Diff <= 30 ~ "Poor",
.$Diff > 30 ~ "Unpublishable",
TRUE ~"TBD")
)
df1
Date Time Diff Accuracy
1 08/29/2011 09:45 0.2 Excellent
2 08/29/2011 10:00 4.3 Good
3 08/30/2011 13:00 6.5 Good
4 08/30/2011 13:30 15.0 Fair
5 08/30/2011 10:14 16.5 Poor
6 08/29/2012 9:09 31.0 Unpublishable
7 08/29/2012 11:23 30.2 Unpublishable
8 01/15/2012 17:06 21.9 Poor
9 08/29/2012 12:20 1.9 Excellent