R-迭代地组合字符向量的连续元素,直到到达空字符串元素

时间:2016-10-04 14:40:54

标签: r text text-mining

我有一个由长字符串(字母数字+特殊字符)组成的字符向量,如下所述。

txt <- c(
         "Spicy jalapeno bacon ipsum dolor amet", 
         "tenderloin. pariatur quis",
         "",
         "consequat pancetta jerky", 
         "porchetta non chuck exercitation",
         "laborum labore ball tip.",
         "",
         "",
         "Duis swine turkey kielbasa. Strip ",
         "steak ribeye laboris,"
        )

所需的输出是

> txt
[1] "Spicy jalapeno bacon ipsum dolor amet tenderloin. pariatur quis"
[2] "consequat pancetta jerky porchetta non chuck exercitation laborum labore ball tip."
[3] "Duis swine turkey kielbasa. Strip steak ribeye laboris,"

需要考虑的事项:
1.空字符串元素/作为换行符。它们可能不止一个 2.在将两个元素连接在一起时,需要在它们之间添加一个空格。

1 个答案:

答案 0 :(得分:2)

执行此操作的众多方法之一:

library(dplyr)
library(purrr)

data_frame(txt=txt, grp=cumsum(txt=="")) %>% 
  group_by(grp) %>% 
  do(data_frame(joined=paste0(.$txt, collapse=" "))) %>% 
  mutate(joined=trimws(joined)) %>% 
  filter(joined != "") %>% 
  ungroup() %>% 
  select(joined) %>% 
  flatten_chr()
## [1] "Spicy jalapeno bacon ipsum dolor amet tenderloin. pariatur quis"                   
## [2] "consequat pancetta jerky porchetta non chuck exercitation laborum labore ball tip."
## [3] "Duis swine turkey kielbasa. Strip  steak ribeye laboris,"