请查看以下5个句子并查看单词"进一步"。我需要以这样的方式构建一个逻辑,我需要在"之前选择两个单词"在"进一步"之后选择两个单词。字。
例如,观察以下五个句子,对于句子1-我需要在进一步选择两个单词之前"到" &安培; "预先"在"进一步"之后没有任何文字。 对于第2句 - 我需要选择"一个" &安培; "早晨"之后没有任何文字"进一步" 对于第3句 - 我需要选择"然后"和"早上" &安培; "丘陵"因为在之前没有两个单词之后的两个单词和之前的单词#34;进一步" 对于第4句 - 我需要选择" mount" &安培;拒绝了,并且"提前"因为有两个词之前"进一步"在"进一步"之后的一个词 句子5 - "早晨","工厂" &安培; "拒绝","到"作为两个词之前和两个词之后"进一步"
任何帮助都是适当的 - 我正在寻找的逻辑应来自R语言
1)Then one morning Mills refused to mount refused to advance further
2)further one morning Mills refused to mount refused to advance
3)Then further morning Mills refused to mount refused to advance
4)Then one morning Mills refused to mount refused further advance
5)Then one morning Mills further refused to mount refused to advance
答案 0 :(得分:2)
这是stringr
和dplyr
的一种方式:
library(stringr)
library(dplyr)
x %>%
str_extract(regex('(?:[^ ]+ ){0,2}further(?: [^ ]+){0,2}', ignore_case = TRUE)) %>%
str_remove(regex("further", ignore_case = TRUE)) %>%
str_squish()
[1] "to advance" "one morning" "Then morning Mills"
[4] "mount refused advance" "morning Mills refused to"
数据:
x <- c("Then one morning Mills refused to mount refused to advance further",
"further one morning Mills refused to mount refused to advance",
"Then further morning Mills refused to mount refused to advance",
"Then one morning Mills refused to mount refused further advance",
"Then one morning Mills further refused to mount refused to advance")
答案 1 :(得分:1)
有很多方法可以做到这一点;这是基地R的一种可能性:
# Your sample strings
ss <- c("Then one morning Mills refused to mount refused to advance further",
"further one morning Mills refused to mount refused to advance",
"Then further morning Mills refused to mount refused to advance",
"Then one morning Mills refused to mount refused further advance",
"Then one morning Mills further refused to mount refused to advance")
sapply(ss, function(x) {
v <- unlist(strsplit(x, " "));
idx <- grep("further", v);
idx <- c(idx - 2, idx - 1, idx + 1, idx + 2);
idx <- idx[idx > 0 & idx <= length(v)];
return(v[idx]);
})
#$`Then one morning Mills refused to mount refused to advance further`
#[1] "to" "advance"
#
#$`further one morning Mills refused to mount refused to advance`
#[1] "one" "morning"
#
#$`Then further morning Mills refused to mount refused to advance`
#[1] "Then" "morning" "Mills"
#
#$`Then one morning Mills refused to mount refused further advance`
#[1] "mount" "refused" "advance"
#
#$`Then one morning Mills further refused to mount refused to advance`
#[1] "morning" "Mills" "refused" "to"
说明:strsplit
每句话都成字;找到"further"
的位置,然后选择并返回前两个单词(如果存在); sapply
每个句子的整个程序。
或者输出中应包含“进一步”一词:
sapply(ss, function(x) {
v <- unlist(strsplit(x, " "));
idx <- grep("further", v);
idx <- c(idx - 2, idx - 1, idx, idx + 1, idx + 2);
idx <- idx[idx > 0 & idx <= length(v)];
return(v[idx]);
})
#$`Then one morning Mills refused to mount refused to advance further`
#[1] "to" "advance" "further"
#
#$`further one morning Mills refused to mount refused to advance`
#[1] "further" "one" "morning"
#
#$`Then further morning Mills refused to mount refused to advance`
#[1] "Then" "further" "morning" "Mills"
#
#$`Then one morning Mills refused to mount refused further advance`
#[1] "mount" "refused" "further" "advance"
#
#$`Then one morning Mills further refused to mount refused to advance`
#[1] "morning" "Mills" "further" "refused" "to"
答案 2 :(得分:1)
另一种方式:
library(stringr)
get_values <- function(str)
{
val <- str_extract(str, "([^\\s]+\\s){0,2}further(\\s[^\\s]+){0,2}")
val <- str_trim(gsub(pattern = 'further', replacement = '', x = val))
return (val)
}
# you can further unlist this to get answer as a vector instead of a list
answer <- lapply(text, get_values)
[[1]]
[1] "to advance"
[[2]]
[1] "one morning"
[[3]]
[1] "Then morning Mills"
[[4]]
[1] "mount refused advance"
[[5]]
[1] "morning Mills refused to"