Question

我有一个数据框，由于各种原因，我需要将其中一个元素作为一个因素，并保持关卡的顺序，用空格替换关卡中的句点。这是一个例子

library(tidyverse) library(stringr)

sandwich <- c("bread", "mustard.sauce", "tuna.fish", "lettuce", "bread")

data_frame(sandwich_str = sandwich) %>%
mutate(sandwich_factor = factor(sandwich)) %>%
mutate(sandwich2 = factor(sandwich_factor,
    levels = str_replace_all(levels(sandwich_factor), "\\.", " "))) %>%
mutate(sandwich3 = str_replace_all(sandwich_str, "\\.", " "))

print(sandwich_df)

# A tibble: 5 x 4    

sandwich_str, sandwich_factor, sandwich2, sandwich3
   <chr>           <fctr>,    <fctr>         <chr>,
1  bread            bread            bread     bread 
2  mustard.sauce    mustard.sauce   <NA>      mustard sauce 
3  tuna.fish        tuna.fish       <NA>      tuna fish
4  lettuce          lettuce         lettuce   lettuce 
5  bread            bread           bread     bread

所以在这个数据框中：

sandwich_str是字符元素

sandwich_factor是因素

的要素在sandwich2中的

我尝试替换sandwich_factor级别中的所有句点。无论出于何种原因，只要有句号，就会返回NA。

在sandwich3中的

我采用更简单的方法，用空格替换字符串中的所有句点。这种方法效果更好。

所以我想知道在尝试夹心2时什么不起作用。我希望它看起来更像三明治3。有什么建议吗？

Answer 1

这适合吗？

library(tidyverse)
library(stringr)

# Data --------------------------------------------------------------------

sandwich <- 
  c("bread", "mustard.sauce", "tuna.fish", "lettuce", "bread")

df <- 
  data_frame(sandwich_str = sandwich) 

# Convert periods to spaces -----------------------------------------------

df$sandwich_str <-
  df$sandwich_str %>%
  as.character() %>%
  str_replace("\\."," ") %>%
  as.factor()

# Print output ------------------------------------------------------------

df %>% 
  print()

Answer 2

感谢@aosmith将此答案作为评论发布。我会在这里发布它作为答案，所以我可以接受并关闭它。

问题是因子级别是使用标志标签而不是级别定义的。所以我之前写这篇文章的正确方法是：

library(tidyverse) library(stringr)

sandwich <- c("bread", "mustard.sauce", "tuna.fish", "lettuce", "bread")

data_frame(sandwich_str = sandwich) %>%
mutate(sandwich_factor = factor(sandwich)) %>%
mutate(sandwich2 = factor(sandwich_factor,
    labels = str_replace_all(levels(sandwich_factor), "\\.", " "))) %>%
mutate(sandwich3 = str_replace_all(sandwich_str, "\\.", " "))

print(sandwich_df)

用dplyr mutate替换数据帧中的因子

2 个答案: