Question

我是library(stringr)的新手。在我的df中，我有一个名为Sentences的专栏，其中每行包含一个句子。现在我想在单词之前和之后找到一个单词和3个单词的位置。

对于eg-

string <- "We have a three step process to validate 
           claims data we use in our analysis."

如果我们搜索单词validate，它将返回8，并返回单词---- 'step' 'process' 'to' 'claims' 'data' 'we'。我尝试了str_match和str_extract。

Answer 1

使用strsplit和grep：

myString <- "We have a three step process to validate claims data we use in our analysis."

# Split the string into individual words
splitString <- strsplit(myString, " ")[[1]]

# Find the location of the word of interest
loc <- grep("validate", splitString)

# Subset as you normally would
splitString[(loc-3):(loc+3)]
# [1] "step"     "process"  "to"       "validate" "claims"   "data"     "we"

更新

如果向量中有多个字符串，则可以尝试以下方法。我已经对它进行了一些修改以使其处于更安全的一面而不是试图提取不存在的位置。

words <- c("How data is Validated?", 
           "We have a three step process to validate claims data we use in our analysis.",
           "Sample Validate: Since No One vendor can provide the total population of claims in a given geographic region")

x <- strsplit(words, " ")
lapply(x, function(y) {
  len <- length(y)
  locs <- grep("validate", y, ignore.case=TRUE)
  min <- ifelse((locs - 3) <= 0, 1, locs-3)
  max <- ifelse((locs + 3) >= length(y), length(y), locs + 3)
  y[min:max]
})
# [[1]]
# [1] "How"        "data"       "is"         "Validated?"
# 
# [[2]]
# [1] "step"     "process"  "to"       "validate" "claims"   "data"     "we"      
# 
# [[3]]
# [1] "Sample"    "Validate:" "Since"     "No"        "One"

如您所见，结果是list向量。

找到一个单词的位置，并在R中的单词之前和之后得到3个单词

1 个答案:

更新