如何搜索列表中的先前字符串,直到找到匹配项

时间:2019-01-16 10:19:36

标签: r

这个问题来自我问过here的上一个问题,但是我更改了输入列表,但问题有所不同:

简而言之,我试图从事件列表中提取“事件”的存在,一旦检测到,我就从位置列表中寻找事件的位置。我首先在事件发生的句子中查看,然后在前一个句子中查看。我想找到原始报告中最接近书面位置(在文本中的事件之前)的位置

我遇到的问题是,该位置可能位于该事件所在的句子之前的两个或三个句子中,所以我也想检测到这些。

我的输入嵌套列表为:

list(c("Oesophagus irregular z-line as previously.", " quad biopsies at ,,,m"
), c("Normal examination", "cardia mild inflammation."
), c("stomach normal", "No problems here", 
"Everything  normal", "Small polyp EMR and completely removed", "GOJ normal", 
"Nodule seen which was normal", "This was removed by EMR", 
"All other sites normal  normal", " A small area of residual stomach was removed by APC "))

事件列表

EventList<-c("RFA","EMR","APC")

位置列表

LocationList<-function(){

  tofind <-paste(c("Stomach","Antrum","Duodenum","Oesophagus","GOJ"),collapse = "|")

  return(tofind)

}

我想要的输出是:

""  
""   
"stomach:EMR, goj:EMR, stomach:APC"

尝试1

@akrun非常有帮助地帮助我创建了解决方案(只要在句子列表中仅搜索前一个句子的位置),如下所示:

sapply(text,function(x) {

           x1 <- str_extract_all(tolower(x),tolower(paste(EventList, collapse="|")))
           i1 <- which(lengths(x1) > 0)
           if(any(i1)) {
             paste(unlist(Map(c, str_extract_all(tolower(x[i1-1]), 
                                         tolower(LocationList())), 
                       str_extract_all(tolower(x[i1]), tolower(LocationList())))), 
                        toupper(x1[i1]), sep=":", collapse=", ") 

           } else ""

             }

             )

似乎我实际上不需要将输入列表保留为嵌套列表(并且将其保留为全文可能更容易,因此我可以在整个内容上使用正则表达式以产生积极的效果)因此可以重新定义上面的功能(使用我在构造时遇到麻烦的部分的伪代码)

sapply(text,function(x) {


text<-lapply(text,function(x) paste(x,collapse=";"))
text<-unlist(text)

               x1 <- str_extract_all(tolower(x),tolower(paste(EventList, collapse="|")))
               i1 <- which(lengths(x1) > 0)
               if(any(i1)) {


              #How to iterate through all the events found in the x1 and then search the nearest location (from the location list) behind this in each report?

} else ""

                 }
)

1 个答案:

答案 0 :(得分:1)

检查我的解决方案:

library(tidyverse)
library(wrapr)

tofind <-paste(c("Stomach", "Antrum", "Duodenum", "Oesophagus", "GOJ"),collapse = "|")

EventList<-c("RFA","EMR","APC")

words <-
  YOURS_LIST %>%
  unlist() %>%
  str_replace_na()%>%
  str_c(collapse = ' ') %>%
  str_split(' ') %>%
  `[[`(1)

EventList %>%
  map(
    ~words %>%
      str_which(paste0('^.*', .x)) %>%
      map_chr(
        ~words[1:.x] %>%
          str_c(collapse = ' ') %>%
          str_extract_all(regex(tofind, ignore_case = TRUE)) %>%
          `[[`(1) %.>%
          .[length(.)]
      ) %>%
      paste0(':', .x)
  ) %>%
  unlist() %>%
  str_subset('.+:')