Question

我必须关注以下数据集。我想创建一个列，以便如果在unid列中有一个数字，那么在dat $ identification中我希望它说“ unidentified”，否则我希望它打印出物种列中的所有内容。因此，最终输出应类似于dat $ identificaiton x,y,unidentified,unidentified。使用此代码，它显示1,2,unidentified,unidentified。

请注意，出于其他目的，我只希望在ifelse语句的！（is.na）部分使用unid列，而在种类中不使用。

unid <- c(NA,NA,1,4)
species <- c("x","y",NA,NA)
df <- data.frame(unid, species)
df$identification <- ifelse(!is.na(unid), "unidentified", df$species)

#Current Output of df$identification: 
1,2,unidentified,unidentified

#Needed Output
x,y,unidentified,unidentified

Answer 1

您可以强制'factor to class字符in the类的列。

df$identification <- ifelse(!is.na(unid), "unidentified", as.character(df$species))

df
#  unid species identification
#1   NA       x              x
#2   NA       y              y
#3    1    <NA>   unidentified
#4    4    <NA>   unidentified

编辑。

OP接受答案后，我提醒自己ifelse速度慢且索引编制速度快，因此我使用了较大的数据集进行了测试。

首先，请查看两种解决方案是否产生相同的结果：

df$id1 <- ifelse(!is.na(unid), "unidentified", as.character(df$species))

df$id2 <- "unidentified"
df$id2[is.na(unid)] <- species[is.na(unid)]

identical(df$id1, df$id2)
#[1] TRUE

结果相同。

现在将它们都使用软件包microbenchmark。

n <- 1e4
df1 <- data.frame(unid = rep(unid, n), species = rep(species, n))

microbenchmark::microbenchmark(
  ifelse = {df1$id1 <- ifelse(!is.na(df1$unid), "unidentified", as.character(df1$species))},
  index = {df1$id2 <- "unidentified"
           df1$id2[is.na(df1$unid)] <- species[is.na(df1$unid)]
          },
  relative = TRUE
)
#Unit: nanoseconds
#    expr      min       lq        mean   median         uq      max  neval cld
#  ifelse 12502465 12749881 16080160.39 14365841 14507468.5 85836870    100   c
#   index  3243697  3299628  4575818.33  3326692  4983170.0 74526390    100   b 
#relative       67       68      208.89      228      316.5      540    100   a

平均而言，索引速度快200倍。编写两行代码，而不是ifelse只写一行代码，这是值得的麻烦。

如何使因子名称出现在R的ifelse语句中？

1 个答案: