我正在编辑一些文字并想知道我是否可以以编程方式搜索某些单词。
这些话:几乎,几乎,接近,非常接近这些词语:确定,完整,死亡,完整,必要和灭绝。
让我说我有这个角色矢量:
text <- c("R is a very essential tool for data analysis. While it is regarded as domain specific, it is a very complete programming language. Almost certainly, many people who would benefit from using R, do not use it")
我可以让R返回一个数字向量,给出这些单词彼此相邻的行号(或句号)吗?
请注意,我使用了“肯定”,所以理想情况下我需要R来搜索包含“某些”或其他单词的单词,而不是整个单词“确定”或其他单词。
答案 0 :(得分:2)
在使用grep
在句子边界拆分文字后,使用strsplit
:
stext <- strsplit(text, split="\\.")[[1]]
grep("certain", stext)
[1] 3
答案 1 :(得分:2)
Andrie的解决方案可以更好地满足您的需求,但我正在为那些希望解析成绩单的未来搜索者提供第二种解决方案。
library(qdap)
stext <- c("R is a very essential tool for data analysis. While it is regarded
as domain specific, it is a very complete programming language. Almost
certainly, many people who would benefit from using R, do not use it.")
dat <- sentSplit(data.frame(dialogue=stext), "dialogue")
with(dat, termco(dialogue, tot, "certain"))
## tot word.count certain
## 1 1.1 9 0
## 2 2.2 14 0
## 3 3.3 14 1(7.14%)
请注意,标点符号很重要,我需要在最后一句中添加丢失的句号。
获取哪个句子包含“确定”的向量:
which(with(dat, termco(dialogue, tot, "certain"))$raw$certain > 0)
## [1] 3