Question

我有以下数据框：

temp <- structure(list(ID = c("1234", "1223", "5555", "2344", "4567", "6543"), 
       Eat = structure(c(6L,1L, 5L, 2L, 3L, 4L), 
       .Label = c("", "Cabbage", "Carrot", "Lettuce", "Potato","Asparagus", "Mushroom", "Apple"), class = "factor")), 
      row.names = c(NA, 6L), class = "data.frame", .Names = c("ID", "Eat"))

我想注意每次吃什么都没有：

temp %>% mutate(Eat = ifelse(Eat != "" & !is.na(Eat), Eat, "Nothing!"))

然而，结果是Eat结构值的变异，：

    ID      Eat
1 1234        6
2 1223 Nothing!
3 5555        5
4 2344        2
5 4567        3
6 6543        4

如何让.Labels进行制作：

    ID      Eat
1 1234Asparagus
2 1223 Nothing!
3 5555   Potato
4 2344  Cabbage
5 4567   Carrot
6 6543  Lettuce

Answer 1

更改因子级别的整齐方式是forcats::fct_recode，它维护因子类型但更改任何指定的级别：

library(forcats)

temp %>% mutate(Eat = fct_recode(Eat, 'Nothing!' = ''))

##     ID       Eat
## 1 1234 Asparagus
## 2 1223  Nothing!
## 3 5555    Potato
## 4 2344   Cabbage
## 5 4567    Carrot
## 6 6543   Lettuce

Answer 2

如果您的项目不是必需项，请尽量避免使用['My', 'name', 'is', '"foo bar"', 'I', 'live', 'in', 'New', 'York']。 factor更易于处理，并且与character一样存储为内存效率。我只在绘图时使用因子，或者需要除字母之外的某些特定排序顺序。

"... R has a global string pool. This means that each unique string is only stored in one place, and therefore character vectors take up less memory than you might expect" (Hadley Wickham, Advanced R)

这在过去是不同的，这解释了为什么字符串强制到factor并且仍然是许多函数的默认值。您必须使用显式参数factor致电read.csv或data.frame以避免这种情况。

最近的R套餐，例如stringsAsFactors = FALSE或来自Hadley的 tidyverse （data.table）的套餐，从不强制输入。

但如果您需要tibble，您可以按照@ Alistaire的建议使用Hadley的factor包裹。

dplyr在数据帧上变异.Label值，而不是引用

2 个答案: