Question

我正在尝试解析R文件，该文件包含以下格式的数据：

Author:              Books:
Jane Austen          Sense and Sensibility 
Justin Bieber        NA
Shakespeare          The Taming of the Shrew | Much Ado About Nothing

它有一对多的结构。我想得到的是一个长格式的数据框，如下所示：

Author:         Books:
Jane Austen     Sense and Sensibility
Shakespeare     The Taming of the Shrew
Shakespeare     Much Ado About Nothing

如果您希望获得一位作者的所有书籍，或者找一位撰写特定书籍的人，这样会更方便。

更一般地说，如何将（字符串，值列表）格式的数据帧转换为（string1，value1）; （string1，value2）; （string2，value3）格式？我知道如何使用strsplit，但我不太清楚这里的数据帧操作。

奖励积分：我想要一些有效的东西（我现实生活中有一个大型数据集）。

我正在考虑构建一个大小合适的空数据框（由sum(sapply(df$colWithListOfStrings,length)))给出，并使用for循环进行迭代以填充它。

PS：我们假设一本书只有一位作者。

Answer 1

您可以使用cSplit包中的splitstackshape（来自Ananda Mahto的非常好的工具）

library(splitstackshape)
cSplit(data, splitCols=2, sep = "|", direction = "long")[!is.na(Books)]
#                   Author                   Books
#1:            Jane Austen   Sense and Sensibility
#2:            Shakespeare The Taming of the Shrew
#3:            Shakespeare  Much Ado About Nothing

dput（数据）

structure(list(Author = c("Jane Austen", "           Justin Bieber", 
                              "           Shakespeare"), Books = c("          Sense and Sensibility ", 
                                                                   "        NA", "          The Taming of the Shrew | Much Ado About Nothing"
                              )), .Names = c("Author", "Books"), class = "data.frame", row.names = c(NA, 
                                                                                                     -3L))

R：使用一对多关系解析数据

1 个答案: