在RStudio中,我遵循了R code to search a word in a paragraph and copy the sentence in a variable
的方法识别包含关键词的句子(例如下面的授粉)我要求的。
但是,我想在这句话之后提取一个句子,并在句子后面包含我需要的关键词。
下面输入的所需输出: 它们的范围远远超过蜜蜂,在加拿大北部的埃尔斯米尔岛上可以找到殖民地,距北极仅880公里!随着最近在温室中使用大黄蜂的普及,它们可能很快就会在世界大部分地区被发现(见下文),尤其是熊蜂藜,这似乎是为此目的出售的最受欢迎的物种。 最近有人建议将大黄蜂引入澳大利亚,以便在温室中为作物授粉。
如果有很多单词授粉,我怎么能通过循环函数获得它。
到目前为止,这是我的R代码:
text <- "Bumblebees are found mainly in northern temperate regions, thoughthere are a few native South American species and New Zealand has some naturalised species that were introduced around 100 years ago to pollinate red clover. They range much further north than honey bees, and colonies can be found on Ellesmere Island in northern Canada, only 880 km from the north pole!
With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long (see below), especially Bombus terrestris which seems to be the most popular species sold for this purpose. Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses. Now, though I dearly love bumblebees, I do think that this might not be a very good idea. No matter what security measures are taken, mated queens WILL escape eventually and that will probably lead to their establishment in the wild.And yet another non-native invasion of a country that has suffered more than most from such things. This invasion may or may not be benign, but isn't it better to err on the side of caution? Apparently there are already colonies of Bombus terrestris on Tasmania, so I suppose it is now only a matter of time before they reach the mainland."
#end
library(qdap)
sent_detect(text)
##There are NINE sentences in text
##Output
[1] "Bumblebees are found mainly in northern temperate regions, though there are a few native South American species and New Zealand has some naturalised species that were introduced around 100 years ago to pollinate red clover."
[2] "They range much further north than honey bees, and colonies can be found on Ellesmere Island in northern Canada, only 880 km from the north pole!"
[3] "With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long, especially Bombus terrestris which seems to be the most popular species sold for this purpose."
[4] "Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses."
[5] "Now, though I dearly love bumblebees, I do think that this might not be a very good idea."
[6] "No matter what security measures are taken, mated queens WILL escape eventually and that will probably lead to their establishment in the wild."
[7] "And yet another non-native invasion of a country that has suffered more than most from such things."
[8] "This invasion may or may not be benign, but isn't it better to err on the side of caution?"
[9] "Apparently there are already colonies of Bombus terrestris on Tasmania, so I suppose it is now only a matter of time before they reach the mainland."
#End
使用quanteda包,我确认有NINE语句然后标记文本:
library(quanteda)
nsentence(text)
# [1] 9
##Searching for word pollination - it finds the first occurrence only
dat <- data.frame(text=sent_detect(text), stringsAsFactors = FALSE)
Search(dat, "pollination")
[1] "With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long, especially Bombus terrestris which seems to be the most popular species sold for this purpose."
#End
答案 0 :(得分:1)
您可以使用基本R模式匹配功能:
d <- sent_detect(text)
# grep the sentense with the keyword:
n <- which(grepl('pollination', d) == T)
# 3
# get context of +-1
d[(n - 1):(n + 1)]
# [1] "They range much further north than honey bees, and colonies can be found on Ellesmere Island in northern Canada, only 880 km from the north pole!"
# [2] "With the recent popularity of using bumblebees in glasshouse pollination they will probably be found in most parts of the world before long, especially Bombus terrestris which seems to be the most popular species sold for this purpose."
# [3] "Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses."
# nice output:
cat(d[(n - 1):(n + 1)])
# if there are multiple sentences with the keyword:
lapply(which(grepl('pollination', d) == T), function(n){
cat(d[(n - 1):(n + 1)])
})
答案 1 :(得分:0)
这是一个相当直接的方式:
dat[c(inds <- grep("[Pp]ollination", dat[[1]]) + 1, inds - 2),]
## [1] "Recently there have been proposals to introduce bumblebees into Australia to pollinate crops in glasshouses."
## [2] "They range much further north than honey bees, and colonies can be found on E
加拿大北部的llesmere岛,距北极仅880公里!“