我有一个包含一些问题的数据框。我想从问题中删除开头的数字和句号,但保留所有其他内容。我不太了解regex,但这似乎是str_split()
的完美用法,尤其是在dplyr
管道中。但是,在分割字符串之后,我不确定如何获取第二个项目。我尝试按位置访问它,但是没有用。
x <- structure(list(question = c("01. I like my job.",
"02. I like my house.",
"03. I like my car.")), class = "data.frame", row.names = c(NA, -3L))
x %>%
mutate(words = str_split(question, "."))
返回此:
question words
01. I like my job. <chr [19]>
02. I like my house. <chr [21]>
03. I like my car. <chr [19]>
我希望它看起来像这样:
question words
01. I like my job. I like my job.
02. I like my house. I like my house.
03. I like my car. I like my car.
我也尝试使用separate()
和strsplit()
,但是我也无法使用其中任何一个。
答案 0 :(得分:4)
我认为您正在寻找str_replace
(或基数R中的sub
)
x %>% mutate(words = str_replace(question, "^\\d+\\.", ""))
# question words
#1 01. I like my job. I like my job.
#2 02. I like my house. I like my house.
#3 03. I like my car. I like my car.
说明:
^
是左字符串锚点\\d+\\.
匹配一个或多个数字,后跟句号您可以通过以下方式使用str_split
x %>% mutate(words = paste0(map_chr(str_split(question, "\\."), 2), "."))
给出相同的结果。
答案 1 :(得分:2)
您可以将模式更改为\\.
,然后获取word
列的第二个元素。
library(tidyverse)
x %>%
mutate(words = str_split(question, "\\. ")[[1]][[2]])
# question words
# 1 01. I like my job. I like my job.
# 2 02. I like my house. I like my job.
# 3 03. I like my car. I like my job.