Question

我有一个可以使用以下代码创建的数据框：

input <- data.frame( 'ID'=c(1:3), 
                      Destination=c("A\r\nB", "C", "D\r\nE\r\nF"), 
                      Topic=c("W", "X", "Y\r\nZ") )

看起来像这样：

  ID Destination  Topic
1  1      A\r\nB      W
2  2           C      X
3  3 D\r\nE\r\nF Y\r\nZ

我想创建一个如下所示的输出数据框：

desiredOutput <- data.frame( 
   ID = c(1,1,1,2,2,3,3,3,3,3) , 
   name=c( "Destination", "Destination", "Topic", "Destination", "Topic",
           "Destination", "Destination", "Destination" , "Topic", "Topic"), 
   value=c("A","B", "W", "C", "X", "D", "E", "F", "Y", "Z") )

   ID        name value
1   1 Destination     A
2   1 Destination     B
3   1       Topic     W
4   2 Destination     C
5   2       Topic     X
6   3 Destination     D
7   3 Destination     E
8   3 Destination     F
9   3       Topic     Y
10  3       Topic     Z

每当出现分隔符\r\n时，我想将内容拆分为单独的行，并使用正确的ID，列的名称和相应的值。

我可以使用strsplit将单个列拆分为列表，但除了尝试编写循环之外，我不知道如何将内容放入数据框中。我希望tidyr包可能会有所帮助。

strsplit(input$Destination, split = "\r\n")

如何做到这一点，理想情况下没有循环？

Answer 1

使用tidyr，gather为长形式，然后使用separate_rows分隔连接的元素：

library(tidyr)

input %>% gather(name, value, -ID) %>% separate_rows(value)
##    ID        name value
## 1   1 Destination     A
## 2   1 Destination     B
## 3   2 Destination     C
## 4   3 Destination     D
## 5   3 Destination     E
## 6   3 Destination     F
## 7   1       Topic     W
## 8   2       Topic     X
## 9   3       Topic     Y
## 10  3       Topic     Z

注意：如果您的数据是因素而不是字符，tidyr会警告您，因为它会强制重新排列字符。它无论如何都会起作用，但是如果你讨厌警告，在重塑之前手动强制转换为字符。

Answer 2

以下是使用data.table

的选项

library(data.table)
melt(setDT(input), id.var = "ID", variable.name = "name")[,
      .(value = unlist(strsplit(value, "\\s+"))), .(ID, name)][order(ID)]
#     ID        name value
#1:  1 Destination     A
#2:  1 Destination     B
#3:  1       Topic     W
#4:  2 Destination     C
#5:  2       Topic     X
#6:  3 Destination     D
#7:  3 Destination     E
#8:  3 Destination     F
#9:  3       Topic     Y
#10: 3       Topic     Z

编辑：@DavidArenburg在另一个解决方案（我之前没有看到）中评论了类似的解决方案。

使用分隔符

2 个答案: