Question

我正在编辑一些文字并想知道我是否可以以编程方式搜索某些单词。

这些话：几乎，几乎，接近，非常接近这些词语：确定，完整，死亡，完整，必要和灭绝。

让我说我有这个角色矢量：

text <- c("R is a very essential tool for data analysis. While it is regarded as domain specific, it is a very complete programming language. Almost certainly, many people who would benefit from using R, do not use it")

我可以让R返回一个数字向量，给出这些单词彼此相邻的行号（或句号）吗？

请注意，我使用了“肯定”，所以理想情况下我需要R来搜索包含“某些”或其他单词的单词，而不是整个单词“确定”或其他单词。

Answer 1

在使用grep在句子边界拆分文字后，使用strsplit：

stext <- strsplit(text, split="\\.")[[1]]
grep("certain", stext)
[1] 3

Answer 2

Andrie的解决方案可以更好地满足您的需求，但我正在为那些希望解析成绩单的未来搜索者提供第二种解决方案。

library(qdap)
stext <- c("R is a very essential tool for data analysis. While it is regarded 
    as domain specific, it is a very complete programming language. Almost 
    certainly, many people who would benefit from using R, do not use it.")

dat <- sentSplit(data.frame(dialogue=stext), "dialogue")
with(dat, termco(dialogue, tot, "certain"))

##   tot word.count  certain
## 1 1.1          9        0
## 2 2.2         14        0
## 3 3.3         14 1(7.14%)

请注意，标点符号很重要，我需要在最后一句中添加丢失的句号。

获取哪个句子包含“确定”的向量：

which(with(dat, termco(dialogue, tot, "certain"))$raw$certain > 0)
## [1] 3

使用R查找单词组合

2 个答案: