将包含列的列的数据帧转换为长格式

时间:2018-03-08 11:54:54

标签: r dataframe

问题

我有一个如下所示的数据框:

  ID    User.Food matched.indexes
1  1         milk        2, 8, 15
2  2       apples                
3  3        bread            4, 6
4  4    ice cream               5
5  5 boxed fruits  

matched.indexes列包含整数向量。我想将其转换为长格式,因此每个匹配的索引都在一行上:

  ID    User.Food matched.indexes
1  1         milk               2
2  1         milk               8
3  1         milk              15
4  2       apples              NA     
5  3        bread               4
6  3        bread               6
7  4    ice cream               5
8  5 boxed fruits              NA

我看到的所有问题和教程都侧重于将具有多个命名列的宽数据框更改为长格式(meltgather等)或分离出一个包含多个命名列的单元格字符串"2, 8, 15",但有了这个,我不清楚如何在matched.indexes列中解包向量?

数据

此数据框来自使用agrep从食物组数据框获得可能匹配的结果。重现它的代码如下:

df1 <- structure(list(ID = 1:5, 
                 User.Food = c("milk", "apples", "bread", "ice cream",  
                               "boxed fruits"), 
                 matched.indexes = list(c(2, 8, 15), NA, c(4,6), c(5),
                                        NA)), 
                 .Names = c("ID", "User.Food", "matched.indexes"), 
                 class = "data.frame", 
                 row.names = c("1", "2", "3", "4", "5"))

2 个答案:

答案 0 :(得分:0)

我们可以将separarte_rowsconvert = TRUE一起使用,将班级从character更改为numeric,从而将这些空白("")替换为NA

library(tidyr)
separate_rows(df1, matched.indexes, convert = TRUE)
#   ID    User.Food matched.indexes
#1  1         milk               2
#2  1         milk               8
#3  1         milk              15
#4  2       apples              NA
#5  3        bread               4
#6  3        bread               6
#7  4    ice cream               5
#8  5 boxed fruits              NA

数据

df1 <- structure(list(ID = 1:5, User.Food = c("milk", "apples", "bread", 
"ice cream", "boxed fruits"), matched.indexes = c("2, 8, 15", 
"", "4, 6", "5", "")), .Names = c("ID", "User.Food", "matched.indexes"
 ), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

答案 1 :(得分:0)

使用Tidyverse

library(tidyverse)

 df1 %>% 
  mutate(matched.indexes =str_split(matched.indexes, ",")) %>% 
  unnest() %>% 
  na_if("")

输出

  ID    User.Food matched.indexes
1  1         milk               2
2  1         milk               8
3  1         milk              15
4  2       apples            <NA>
5  3        bread               4
6  3        bread               6
7  4    ice cream               5
8  5 boxed fruits            <NA>