我有一个如下所示的数据框:
ID User.Food matched.indexes
1 1 milk 2, 8, 15
2 2 apples
3 3 bread 4, 6
4 4 ice cream 5
5 5 boxed fruits
matched.indexes
列包含整数向量。我想将其转换为长格式,因此每个匹配的索引都在一行上:
ID User.Food matched.indexes
1 1 milk 2
2 1 milk 8
3 1 milk 15
4 2 apples NA
5 3 bread 4
6 3 bread 6
7 4 ice cream 5
8 5 boxed fruits NA
我看到的所有问题和教程都侧重于将具有多个命名列的宽数据框更改为长格式(melt
,gather
等)或分离出一个包含多个命名列的单元格字符串"2, 8, 15"
,但有了这个,我不清楚如何在matched.indexes
列中解包向量?
此数据框来自使用agrep
从食物组数据框获得可能匹配的结果。重现它的代码如下:
df1 <- structure(list(ID = 1:5,
User.Food = c("milk", "apples", "bread", "ice cream",
"boxed fruits"),
matched.indexes = list(c(2, 8, 15), NA, c(4,6), c(5),
NA)),
.Names = c("ID", "User.Food", "matched.indexes"),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
答案 0 :(得分:0)
我们可以将separarte_rows
与convert = TRUE
一起使用,将班级从character
更改为numeric
,从而将这些空白(""
)替换为NA
library(tidyr)
separate_rows(df1, matched.indexes, convert = TRUE)
# ID User.Food matched.indexes
#1 1 milk 2
#2 1 milk 8
#3 1 milk 15
#4 2 apples NA
#5 3 bread 4
#6 3 bread 6
#7 4 ice cream 5
#8 5 boxed fruits NA
df1 <- structure(list(ID = 1:5, User.Food = c("milk", "apples", "bread",
"ice cream", "boxed fruits"), matched.indexes = c("2, 8, 15",
"", "4, 6", "5", "")), .Names = c("ID", "User.Food", "matched.indexes"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
答案 1 :(得分:0)
使用Tidyverse
library(tidyverse)
df1 %>%
mutate(matched.indexes =str_split(matched.indexes, ",")) %>%
unnest() %>%
na_if("")
输出
ID User.Food matched.indexes
1 1 milk 2
2 1 milk 8
3 1 milk 15
4 2 apples <NA>
5 3 bread 4
6 3 bread 6
7 4 ice cream 5
8 5 boxed fruits <NA>