Question

我要写的功能是用来修剪常用词

trim_common_name <- function (x) {
  v_replacements <- c(
    `[:punct:]` = ""
    # `data.*` = "dataset"
  )
  x %>%
    str_to_lower() %>%
    str_replace_all(., v_replacements) %>%
    str_replace_all(., "_+", "_") %>%
    str_trim()

}

在上下文中，我想找到单词“ data”并将其替换为单词“ dataset”，而不管句子中“ data”的位置如何。我该怎么办？

例如：

“ abc数据” --->“数据集”

'数据abc'--->'数据集'

'abc数据和数据'--->'数据集'

Answer 1

您可以使用正则表达式模式as you can see here

模式为\bdata\b。单词边界只是指定单词的开始或结束，因此不会出现datatype ---> datasettype

之类的不正确匹配项

Answer 2

我不确定，这是您想要的吗？

x <- c('abc data','data abc' ,'abc data&data')
x
[1] "abc data"      "data abc"      "abc data&data"
x[grep("data",x)]<-"dataset"
x
[1] "dataset" "dataset" "dataset"

检查?grep

句子中的字符串替换有特定的单词

2 个答案: