Question

假设我们将一个全文本文件作为字符向量加载到R中。我正在寻找一个代码，它将在两个“。”之间拉出所有文本，只要在这两个句点之间，存在“和”和至少一个“％”。

character <- as.character("Walmart stocks remained the same.  Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same.  And the percent of increase for Best Buy was 2.5%.")

看一下这个简短的例子，我希望在

的某个地方输出一个输出

[1] Sony reported an increase, and the percent was posted at 1.0%.
[2] And the percent of increase for Best Buy was 2.5%.

Answer 1

快速解决方案：

library(magrittr)
"Walmart stocks remained the same.  Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same.  And the percent of increase for Best Buy was 2.5%." %>%
  ## split the string at the sentence boundaries
  gsub("\\.\\s", "\\.\t", .) %>%
  strsplit("\\t") %>% unlist() %>%
  ## keep only sentences that contain "and the" (irrespective of case)
  grep("and the", x = ., value = TRUE, ignore.case = TRUE) %>%
  ## keep only the sentences that end with %.
  grep("%\\.$", x = ., value = TRUE) %>%
  ## remove leading white spaces
  gsub("^\\s?", "", x = .)

提取满足R中两个条件的字符向量的句子

1 个答案: