这是示例数据:
example_sentences <- data.frame(doc_id = c(1,2,3),
sentence_id = c(1,2,3),
sentence = c("problem not fixed","i like your service and would tell others","peope are nice however the product is rubbish"))
matching_df <- data.frame(x = c("not","and","however"))
由reprex package(v0.2.1)于2019-01-07创建
我想在字符串中的某个word
之前添加/插入逗号。例如,如果我的字符串是:
problem not fixed.
我想将其转换为
problem, not fixed.
另一个matching_df
包含要匹配的单词(这些单词是Coordinate conjunctions
),因此,如果在x
中找到了matching_df
,请在{之前插入comma + space
{1}}。
我看过detected word
软件包,但不确定如何实现。
最好
答案 0 :(得分:2)
gsubfn软件包中的gsubfn
函数将正则表达式作为第一个参数,并将列表(或某些其他对象)作为第二个参数,其中列表的名称是要匹配的字符串,并且其中的值该列表是替换字符串。
library(gsubfn)
gsubfn("\\w+", as.list(setNames(paste0(matching_df$x, ","), matching_df$x)),
format(example_sentences$sentence))
给予:
[1] "problem not, fixed "
[2] "i like your service and, would tell others "
[3] "peope are nice however, the product is rubbish"
答案 1 :(得分:1)
我不知道您所讨论的数据框架是什么样的,但是我在这里做了一个简单的数据框架,其中包含一些短语:
df <- data.frame(strings = c("problems not fixed.","Help how are you"),stringsAsFactors = FALSE)
然后我制作了一个单词向量,以逗号分隔:
words <- c("problems","no","whereas","however","but")
然后,我使用 gsub 用一个简单的for循环将短语的数据框放入一个循环中,用单词 gsub 代替单词:
for (i in 1:length(df$strings)) {
string <- df$strings[i]
findWords <- intersect(unlist(strsplit(string," ")),words)
if (!is.null(findWords)) {
for (j in findWords) {
df$strings[i] <- gsub(j,paste0(j,","),string)
}
}
}
输出:
df
strings
1 problems, not fixed.
2 Help how are you