Question

我有一个有一百万字的文本文件。现在，我需要知道如何使用R来查找单词的尾随和引导词。

例如，如果我想找出“错误”一词之前和之后出现的单词。它可能就像跟随前导词一样

"typo error"
"manual error"
"system error"

和尾随的单词，如

"error corrected"
"error found"
"error occurred"

知道怎么做吗？提前感谢您的意见。

Answer 1

对于错误之前出现的词语：

x <- "no error and no error and some error" # input

library(gsubfn)
rx <- "(\\w+) error"
table(strapplyc(x, rx)[[1]])

，并提供：

  no some 
   2    1

将rx替换为以下错误后的字词：

rx <- "error (\\w+)"

Answer 2

我的解决方案是str_match_all：

library(stringr)
txt <- "system error corrected hardcore error detected wtf error holymoly"
str_match_all(txt, "\\s*(\\w+)\\serror\\s*(\\w+)")

[[1]] 
     [,1]                       [,2]       [,3]        
[1,] "system error corrected"   "system"   "corrected" 
[2,] " hardcore error detected" "hardcore" "detected"  
[3,] " wtf error holymoly"      "wtf" "holymoly"

Answer 3

这个怎么样：

#iChat-style {
    width:100%;
    height:300px;
    overflow:auto;
}

如何使用R查找单词的尾随和引导词？

3 个答案: