Question

我尝试使用mutate编写一个新的因子列，具体取决于现有列的值，使用dplyr对另一列进行分组，所有这一切看起来相当简单但出于某种原因，R对此并不满意并不断提出警告并创建一个角色列而不是因素...

我显然可以这样离开并使用df$col <- factor(df$col)添加一行，但我想了解我的代码有什么问题并更正它以便它直接在{{{} 1}}。

这是一个MWE，可以在我有权访问的计算机上重现错误：

mutate

编辑：我的问题在标题中说明：我收到强制警告。 R产生的输出很好，除了df <- data.frame( Subject = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8), StimLabel = factor(c("NoLabelFeedback","NoLabelFeedback", "NoLabelFeedback","NoLabelFeedback", "Saldie","Gatoo", "Gatoo","Saldie", "NoLabelFeedback","NoLabelFeedback", "NoLabelFeedback","NoLabelFeedback", "Saldie","Gatoo", "Gatoo","Saldie")) ) df <- df %>% group_by(Subject) %>% mutate(Condition = factor(ifelse("NoLabelFeedback" %in% StimLabel,"NoLabel","Label")))是Condition列而不是character这一事实。

我尝试做的事情是，如果主题factor中的任何值为StimLabel，则将"NoLabelFeedback"的值设置为{{1}对于主题。在实践中，我使用Condition作为每个主题，"NoLabel"的全部或全部值都不会是%in%，并且我觉得这样R会有由于在第一次测试后停止检查数据帧，所以进行一半时间的测试较少。如果有人知道如何让它变得更好我就会为它做好准备，但这不是这个问题的重点。

Answer 1

问题似乎是您在使用ifelse函数时声明了一个因子变量。 R不会以这种方式知道因子 level 。以下代码有效：

library(dplyr)

df <- data.frame(
  Subject = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
  StimLabel = factor(c("NoLabelFeedback","NoLabelFeedback",
                       "NoLabelFeedback","NoLabelFeedback",
                       "Saldie","Gatoo",
                       "Gatoo","Saldie",
                       "NoLabelFeedback","NoLabelFeedback",
                       "NoLabelFeedback","NoLabelFeedback",
                       "Saldie","Gatoo",
                       "Gatoo","Saldie"))
)

df2 <- df %>% group_by(Subject) %>%
    mutate(Condition = factor(ifelse("NoLabelFeedback" %in% StimLabel,
                                     "NoLabel","Label"),
                              levels = c("NoLabel","Label")))

Answer 2

如果我们使用if/else

，效率会略有提高

df %>%
   group_by(Subject) %>%
   mutate(Condition = factor(if("NoLabelFeedback" %in% StimLabel) "NoLabel" else "Label",
                                  levels = c("NoLabel", "Label")))

但是，如果我们选择data.table，那就会更快

基准

set.seed(24)
df <- data.frame(Subject = rep(1:1e5, each = 30),
               StimLabel = sample(c("NoLabelFeedback","Saldie","Gatoo"),
                      1e5*30, replace = TRUE))


system.time({
  r1 <- df %>%
             group_by(Subject) %>%
             mutate(Condition = factor(if("NoLabelFeedback" %in% StimLabel) "NoLabel"
                        else "Label", levels = c("NoLabel", "Label")))
   })
 # user  system elapsed 
 #  8.55    0.00    8.58 




system.time({
   r2 <- df %>% group_by(Subject) %>%
    mutate(Condition = factor(ifelse("NoLabelFeedback" %in% StimLabel,
                                     "NoLabel","Label"),
                              levels = c("NoLabel","Label")))
}) 
#user  system elapsed 
#   9.46    0.00    9.62

使用data.table

library(data.table)
system.time({

     setDT(df)[, Condition := factor(if("NoLabelFeedback" %in% StimLabel) "NoLabel"
      else "Label", levels = c("NoLabel", "Label")), Subject]

})
# user  system elapsed 
#   1.48    0.02    1.50 

identical(df$Condition, r1$Condition)
#[1] TRUE

使用ifelse和％in％的R dplyr :: mutate无法创建因子（在mutate_impl中：绑定字符和因子向量，强制转换为字符向量）

2 个答案:

基准