Question

如何提取特定单词旁边的单词/句子？示例：

“ 6月28日，简去电影院吃了爆米花”

我想选择'Jane'并得到[-2,2]，意思是：

“ 6月28日，简去了”

Answer 1

这是一个扩展了多次出现的示例。基本上，在空白处分割，找到单词，展开索引，然后列出结果。

s <- "On June 28, Jane went to the cinema and ate popcorn. The next day, Jane hiked on a trail."
words <- strsplit(s, '\\s+')[[1]]
inds <- grep('Jane', words)
lapply(inds, FUN = function(i) {
  paste(words[max(1, i-2):min(length(words), i+2)], collapse = ' ')
})
#> [[1]]
#> [1] "June 28, Jane went to"
#> 
#> [[2]]
#> [1] "next day, Jane hiked on"

^{由reprex package（v0.3.0）于2019-09-17创建}

Answer 2

我们可以提供帮助的功能。这可能会使它更具动态性。

library(tidyverse)

txt <- "On June 28, Jane went to the cinema and ate popcorn"

grab_text <- function(text, target, before, after){
  min <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))-before
  max <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))+after

  paste(str_split(text, "\\s")[[1]][min:max], collapse = " ")
}

grab_text(text = txt, target = "Jane", before = 2, after  = 2)
#> [1] "June 28, Jane went to"

首先我们将句子拆分，然后找出目标的位置，然后抓取前后的任何单词（函数中指定的数字），最后将句子折叠在一起。

Answer 3

我使用的是str_extract中的stringr的较短版本

library(stringr)
txt <- "On June 28, Jane went to the cinema and ate popcorn"
str_extract(txt,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")

[1] "June 28, Jane went to"

函数str_extract从字符串中提取模式。正则表达式\\s用于空格，而[^\\s]则是对它的取反，因此除空格之外的任何东西都不能使用。因此整个模式是Jane，前后有两次空白，并且由空白以外的任何内容组成

优点是它已经被矢量化了，如果您有一个文本矢量，则可以使用str_extract_all：

s <- c("On June 28, Jane went to the cinema and ate popcorn. 
          The next day, Jane hiked on a trail.",
       "an indeed Jane loved it a lot")

str_extract_all(s,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")

[[1]]
[1] "June 28, Jane went to"   "next day, Jane hiked on"

[[2]]
[1] "an indeed Jane loved it"

Answer 4

这应该有效：

PIE

在特定单词之前和之后提取5个单词

4 个答案: