如何拆分数据帧并连接char向量?

时间:2019-04-11 12:26:15

标签: r dataframe

我有一个像这样的数据帧,它是由使用pdftools::pdf_text的pdf提取得到的:

page_id <- c("7", "7", "7", "8", "8")
element_id <- c("1", "2", "3", "1", "2")
text <- c("One morning,", "when Gregor Samsa woke from troubled dreams,", "he found himself transformed in his bed into a horrible vermin.", "He lay on his armour-like back, and if he lifted his head a little he could see his brown belly,", "slightly domed and divided by arches into stiff sections.")

page_id element_id                                                                                             text
1       7          1                                                                                     One morning,
2       7          2                                                     when Gregor Samsa woke from troubled dreams,
3       7          3                                  he found himself transformed in his bed into a horrible vermin.
4       8          1 He lay on his armour-like back, and if he lifted his head a little he could see his brown belly,
5       8          2                                        slightly domed and divided by arches into stiff sections.

问题是,对于以后的文本处理,我需要一个带有两个向量的数据框:page_id和每个页面的完整内容(text)。我使用以下方法拆分了df:splitted_sampledata <- split(sample_data, sample_data$page_id)

$`7`
  page_id element_id                                                            text
1       7          1                                                    One morning,
2       7          2                    when Gregor Samsa woke from troubled dreams,
3       7          3 he found himself transformed in his bed into a horrible vermin.

$`8`
  page_id element_id                                                                                             text
4       8          1 He lay on his armour-like back, and if he lifted his head a little he could see his brown belly,
5       8          2                                        slightly domed and divided by arches into stiff sections.

但是,这给我留下了一系列数据帧,而这并不是我最初想要的。为了获得理想的结果,我必须将char向量连接到文本列中,对吗? 如何获得每个矢量都包含“迷你文档”的数据帧?任何帮助深表感谢!

预期输出

page_id                                                                                                                                                       text
1       7                                  One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin.
2       8 He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.

0 个答案:

没有答案