如何使用R编程查找给定字符串中的单词索引或位置

时间:2019-07-18 08:55:30

标签: r

如何找到给定字符串中单词的索引或位置,下面的代码说明单词的起始位置和长度。找到单词的位置后,我想提取项目中的前一个单词和后一个单词。

library(stringr)
Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was somewhat unique")

word_pos <- regexpr('termination', Output_text)


Output:

[1] 45
attr(,"match.length")
[1] 11
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

45-它正在计算每个字符并显示“终止”的起始位置

11-是长度

“终止”在第7位,如何使用r编程找到它

感谢您的帮助。

3 个答案:

答案 0 :(得分:2)

这里是:

library(stringr)

Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was somewhat unique")

words <- unlist(str_split(Output_text, " "))

which(words == "termination")
[1] 7

编辑:

对于单词在文本中的多次出现并生成下一个和上一个关键字:

# Adding a few random "termination" words to the string:

Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was termination somewhat unique termination")

words <- unlist(str_split(Output_text, " "))

t1 <- which(words == "termination")
next_keyword <- words[t1+1]
previous_keywords <- words[t1-1]

> next_keyword
[1] "disputes" "somewhat" NA        
> previous_keywords
[1] "contract" "was"      "unique" 

答案 1 :(得分:0)

您可以执行此操作而不必担心使用正则表达式而无需任何外部包的字符索引。

# replace whole string by the words preceding and following 'termination'
(words <- sub("[\\S\\s]+ (\\S+) termination (\\S+) [\\S\\s]+", "\\1 \\2", Output_text, perl = T))
# [1] "contract disputes"

# Split the resulting string into two individual strings
(words <- unlist(strsplit(words, " ")))
# [1] "contract" "disputes"

答案 2 :(得分:0)

最简单的方法是匹配terminationstr_extract中的周围单词,然后匹配str_remove termination

str_remove(str_extract(Output_text,"\\w+ termination \\w+"),"termination ")
[1] "contract disputes"