如何提取特定单词旁边的单词/句子?示例:
“ 6月28日,简去电影院吃了爆米花”
我想选择'Jane'并得到[-2,2],意思是:
“ 6月28日,简去了”
答案 0 :(得分:2)
这是一个扩展了多次出现的示例。基本上,在空白处分割,找到单词,展开索引,然后列出结果。
s <- "On June 28, Jane went to the cinema and ate popcorn. The next day, Jane hiked on a trail."
words <- strsplit(s, '\\s+')[[1]]
inds <- grep('Jane', words)
lapply(inds, FUN = function(i) {
paste(words[max(1, i-2):min(length(words), i+2)], collapse = ' ')
})
#> [[1]]
#> [1] "June 28, Jane went to"
#>
#> [[2]]
#> [1] "next day, Jane hiked on"
由reprex package(v0.3.0)于2019-09-17创建
答案 1 :(得分:1)
我们可以提供帮助的功能。这可能会使它更具动态性。
library(tidyverse)
txt <- "On June 28, Jane went to the cinema and ate popcorn"
grab_text <- function(text, target, before, after){
min <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))-before
max <- which(unlist(map(str_split(text, "\\s"), ~grepl(target, .x))))+after
paste(str_split(text, "\\s")[[1]][min:max], collapse = " ")
}
grab_text(text = txt, target = "Jane", before = 2, after = 2)
#> [1] "June 28, Jane went to"
首先我们将句子拆分,然后找出目标的位置,然后抓取前后的任何单词(函数中指定的数字),最后将句子折叠在一起。
答案 2 :(得分:1)
我使用的是str_extract
中的stringr
的较短版本
library(stringr)
txt <- "On June 28, Jane went to the cinema and ate popcorn"
str_extract(txt,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")
[1] "June 28, Jane went to"
函数str_extract
从字符串中提取模式。正则表达式\\s
用于空格,而[^\\s]
则是对它的取反,因此除空格之外的任何东西都不能使用。因此整个模式是Jane
,前后有两次空白,并且由空白以外的任何内容组成
优点是它已经被矢量化了,如果您有一个文本矢量,则可以使用str_extract_all
:
s <- c("On June 28, Jane went to the cinema and ate popcorn.
The next day, Jane hiked on a trail.",
"an indeed Jane loved it a lot")
str_extract_all(s,"([^\\s]+\\s+){2}Jane(\\s+[^\\s]+){2}")
[[1]]
[1] "June 28, Jane went to" "next day, Jane hiked on"
[[2]]
[1] "an indeed Jane loved it"
答案 3 :(得分:-1)
这应该有效:
PIE