Question

这是示例数据：

example_sentences <- data.frame(doc_id = c(1,2,3),
                                sentence_id = c(1,2,3),
                                sentence = c("problem not fixed","i like your service and would tell others","peope are nice however the product is rubbish"))
matching_df <- data.frame(x = c("not","and","however"))

^{由reprex package（v0.2.1）于2019-01-07创建}

我想在字符串中的某个word之前添加/插入逗号。例如，如果我的字符串是：

problem not fixed.

我想将其转换为

problem, not fixed.

另一个matching_df包含要匹配的单词（这些单词是Coordinate conjunctions），因此，如果在x中找到了matching_df，请在{之前插入comma + space {1}}。

我看过detected word软件包，但不确定如何实现。

最好

Answer 1

gsubfn软件包中的gsubfn函数将正则表达式作为第一个参数，并将列表（或某些其他对象）作为第二个参数，其中列表的名称是要匹配的字符串，并且其中的值该列表是替换字符串。

library(gsubfn)

gsubfn("\\w+", as.list(setNames(paste0(matching_df$x, ","), matching_df$x)), 
  format(example_sentences$sentence))

给予：

[1] "problem not, fixed                            "
[2] "i like your service and, would tell others    "
[3] "peope are nice however, the product is rubbish"

Answer 2

我不知道您所讨论的数据框架是什么样的，但是我在这里做了一个简单的数据框架，其中包含一些短语：

df <- data.frame(strings = c("problems not fixed.","Help how are you"),stringsAsFactors = FALSE)

然后我制作了一个单词向量，以逗号分隔：

words <- c("problems","no","whereas","however","but")

然后，我使用 gsub 用一个简单的for循环将短语的数据框放入一个循环中，用单词 gsub 代替单词：

for (i in 1:length(df$strings)) {
    string <- df$strings[i]
    findWords <- intersect(unlist(strsplit(string," ")),words)
    if (!is.null(findWords)) {
        for (j in findWords) {
            df$strings[i] <- gsub(j,paste0(j,","),string)
        }
    }
}

输出：

 df
               strings
1 problems, not fixed.
2     Help how are you

在r中的某些单词之后在文本字符串中插入逗号

2 个答案: