Question

我一直在使用dplyr::recode()函数来重新编码一些变量。我有一个字符变量，有一些空字符串，我也想重新编码。但是如果我在函数的参数中引用空字符串，我会收到错误。

# input
x <- c("a", "b", "", "x", "y", "z")
# desired output
c("Apple", "Banana", "Missing", "x", "y", "z")

dplyr::recode(x, "a"="Apple", "b"="Banana", ""="Missing")

Error: attempt to use zero-length variable name

如果我将空字符串视为缺失值，则该函数将其保留为空字符串。

dplyr::recode(x, "a"="Apple", "b"="Banana", .missing="Missing")

[1] "Apple"  "Banana" ""       "x"      "y"      "z"

如何重新编码值以获得所需的输出？

Answer 1

为什么不使用基础R factor？

myFac <- factor(x, levels=x, labels=c("Apple", "Banana", "Missing", "x", "y", "z"))
myFac
[1] Apple   Banana  Missing x       y       z      
Levels: Apple Banana Missing x y z

如果需要，您可以将其转换为字符向量：

as.character(myFac)
[1] "Apple"   "Banana"  "Missing" "x"       "y"       "z"

Answer 2

您可以使用na_if让.missing正常工作：

x <- c("a", "b", "", "x", "y", "z")
dplyr::recode(na_if(x,""), "a"="Apple", "b"="Banana", .missing="Missing")

[1] "Apple"   "Banana"  "Missing" "x"       "y"       "z"

Answer 3

在这些情况下，我使用ifelse。您的示例是：x <- ifelse(x == "", "Missing", x)。

在data.frame上下文中，您可以在mutate：

中使用它

df_x <- data.frame(col1 = c("a", "b", "", "x", "y", "z"))
df_new <- df_x %>% 
          mutate(col1 = ifelse(col1 == "", "Missing", col1))

用一些空字符串重新编码字符向量

3 个答案: