R中的段落分割

时间:2017-09-28 20:31:25

标签: r regex

我试图在R

中逐段拆分文档
test.text <- c("First paragraph.  Second sentence of 1st paragraph.

           Second paragraph.")
# When we run the below, we see separation of \n\n between the 2nd and 3rd sentences
test.text

# This outputs the desired 2 blank lines in the console
writeLines("\n\n")

a <- strsplit(test.text, "\\n\\n")

它没有正确分裂。

1 个答案:

答案 0 :(得分:2)

strsplit的输出为list。此外,\n\n之后还有空格。因此,我们需要注意这一点,并使用vector[[

将其转换为unlist
a <- strsplit(test.text, "\n+\\s+")[[1]]
a
#[1] "First paragraph.  Second sentence of 1st paragraph." "Second paragraph."