如何在N个单词上拆分单元格?

时间:2018-03-22 10:12:16

标签: r dplyr strsplit

我有一个包含长文本的列的数据框,我想每30个字拆分一次,创建必要的新行,其他列中的内容完全相同。角色解决方案不起作用,因为我需要它才能工作,这就是我发布这个不同问题的原因。

df1<-data_frame(V1=c(1, 2, 3), V2=c('Red', 'Blue', 'Red'), text=c('Folly words widow one downs few age every seven. If miss part by fact he park just shew. Discovered had get considered projection who favourable. Necessary up knowledge it tolerably. Unwilling departure education is be dashwoods or an. Use off agreeable law unwilling sir deficient curiosity instantly. Easy mind life fact with see has bore ten. Parish any chatty can elinor direct for former. Up as meant widow equal an share least', 'Bringing unlocked me an striking ye perceive. Mr by wound hours oh happy. Me in resolution pianoforte continuing we. Most my no spot felt by no. He he in forfeited furniture sweetness he arranging. Me tedious so to behaved written account ferrars moments. Too objection for elsewhere her preferred allowance her. Marianne shutters mr steepest to me. Up mr ignorant produced distance although is sociable blessing. Ham whom call all lain like.', 'Did shy say mention enabled through elderly improve. As at so believe account evening behaved hearted is. House is tiled we aware. It ye greatest removing concerns an overcame appetite. Manner result square father boy behind its his. Their above spoke match ye mr right oh as first. Be my depending to believing perfectly concealed household. Point could to built no hours smile sense.Breakfast agreeable incommode departure it an. By ignorant at on wondered relation. Enough at tastes really so cousin am of. Extensive therefore supported by extremity of contented. Is pursuit compact demesne invited elderly be. View him she roof tell her case has sigh. Moreover is possible he admitted sociable concerns. By in cold no less been sent hard hill.' ))

我尝试了以下内容:

df <- df1%>%
      mutate(text = strsplit(as.character(text), "\\W+{30}")) %>%
      unnest(text)

但它不起作用。

2 个答案:

答案 0 :(得分:1)

以下是一个选项separate_rows,然后paste一起

df1 %>%
   separate_rows(text) %>%
   group_by(V1) %>%
   group_by(V2, grp = ((row_number()-1) %/%30) + 1, add = TRUE) %>% 
   summarise(text = paste(text, collapse= ' ')) %>%
   ungroup %>%
   select(-grp)

答案 1 :(得分:0)

试试这个,它对我有用。

str_match_all(text, "(?:\\w+\\W*){30}")