提取满足R中两个条件的字符向量的句子

时间:2017-08-09 16:51:23

标签: r grep paste

假设我们将一个全文本文件作为字符向量加载到R中。我正在寻找一个代码,它将在两个“。”之间拉出所有文本,只要在这两个句点之间,存在“和”和至少一个“%”。

character <- as.character("Walmart stocks remained the same.  Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same.  And the percent of increase for Best Buy was 2.5%.")

看一下这个简短的例子,我希望在

的某个地方输出一个输出
[1] Sony reported an increase, and the percent was posted at 1.0%.
[2] And the percent of increase for Best Buy was 2.5%.

1 个答案:

答案 0 :(得分:1)

快速解决方案:

library(magrittr)
"Walmart stocks remained the same.  Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same.  And the percent of increase for Best Buy was 2.5%." %>%
  ## split the string at the sentence boundaries
  gsub("\\.\\s", "\\.\t", .) %>%
  strsplit("\\t") %>% unlist() %>%
  ## keep only sentences that contain "and the" (irrespective of case)
  grep("and the", x = ., value = TRUE, ignore.case = TRUE) %>%
  ## keep only the sentences that end with %.
  grep("%\\.$", x = ., value = TRUE) %>%
  ## remove leading white spaces
  gsub("^\\s?", "", x = .)