我列出了3个将事物分类为水果,车辆和鲜花的清单。
category <-
structure(
list(
fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
vehicles = c("car", "bike", "motorbike", "train", "plane"),
flowers <- list("rose", "tulip", "sunflower")
),
.Names = c(
"fruits", "vehicles", "flowers"
)
)
然后我有一个包含2个向量的数据框,其中包含列表中的元素。向量a每个单元格可以包含任意数量的对象,向量b每个单元格只有一个元素。
a <- I(list(c("apple", "car"),
c("motorbike", "banana", "tulip"),
c("rose", "kiwi", "apple"),
c("bike", "sunflower", "lemon"),
c("orange"),
c("tulip", "pear")))
b <- c("motorbike", "pear", "sunflower", "orange", "car", "apple")
funnydata <- data.frame(a, b)
我想创建第三个向量,它给出向量a中的元素与向量b中的元素在同一列表/类别中。所以期望的结果将是
a b c
1 apple, car motorbike car
2 motorbik.... pear banana
3 rose, ki.... sunflower rose
4 bike, su.... orange lemon
5 orange car NA
6 tulip, pear apple pear
只要我将列表固定下来,我设法将矢量中的元素放在特定列表中:
funnydata$c <- sapply(funnydata$a, function(x) intersect(fruits, unlist(x))) # fixed list
funnydata$c
[[1]]
[1] "apple"
[[2]]
[1] "banana"
[[3]]
[1] "apple" "kiwi"
[[4]]
[1] "lemon"
[[5]]
[1] "orange"
[[6]]
[1] "pear"
我还可以指定列表b在:
sapply(funnydata$b, function(y) names(category[grep(y, category) ]))
[1] "vehicles" "fruits" "flowers" "fruits" "vehicles" "fruits"
但我坚持将两者结合起来。如果我尝试
,我会得到所有character(0)
funnydata$c <- sapply(funnydata$a, function(x) intersect(sapply(funnydata$b, function(y)
category[grep(y, category) ]), unlist(x)))
有人可以帮忙吗?
修改
我发现原始帖子中有一个错误:category
中的对象都应该是相同的类型(向量或列表,无论哪个更符合需求)。所以它应该是:
category <-
structure(
list(
fruits = c("apple", "banana", "pear", "lemon", "kiwi", "orange"),
vehicles = c("car", "bike", "motorbike", "train", "plane"),
flowers = c("rose", "tulip", "sunflower")
),
.Names = c(
"fruits", "vehicles", "flowers"
)
)
不知道这是否会改变现有答案的内容。我还在试图把我的思绪包裹起来。如果这个复制粘贴错误使事情变得比以前更加复杂,我很抱歉。
答案 0 :(得分:2)
我们可以通过加入
来做到这一点library(tidyverse)
dat <- rownames_to_column(funnydata, 'rn')
catdat <- stack(category)
dat %>%
unnest %>%
left_join(catdat, by = c(a = "values")) %>%
left_join(catdat, by = c(b = "values")) %>%
filter(ind.x == ind.y) %>%
select(rn, c=a) %>%
right_join(dat) %>%
select(names(funnydata), c)
# a b c
#1 apple, car motorbike car
#2 motorbik.... pear banana
#3 rose, ki.... sunflower rose
#4 bike, su.... orange lemon
#5 orange car <NA>
#6 tulip, pear apple pear
答案 1 :(得分:2)
有关带有列表列的data.frames的大多数问题可以通过将这些列表列转换为“平面”向量来解决。
因此我们将两个原始data.frames转换为更长版本:
category_df <- data.frame(
group = rep(names(category), times = lengths(category)),
member = unlist(category)
)
category_df
# group member
# fruits1 fruits apple
# fruits2 fruits banana
# fruits3 fruits pear
# fruits4 fruits lemon
# fruits5 fruits kiwi
# fruits6 fruits orange
# vehicles1 vehicles car
# vehicles2 vehicles bike
# vehicles3 vehicles motorbike
# vehicles4 vehicles train
# vehicles5 vehicles plane
# flowers1 flowers rose
# flowers2 flowers tulip
# flowers3 flowers sunflower
funnydata[["index"]] <- seq_len(nrow(funnydata))
funny_flat <- data.frame(
a = unlist(funnydata[["a"]]),
b = rep(funnydata[["b"]], times = lengths(funnydata[["a"]])),
index = rep(funnydata[["index"]], times = lengths(funnydata[["a"]]))
)
funny_flat
# a b index
# 1 apple motorbike 1
# 2 car motorbike 1
# 3 motorbike pear 2
# 4 banana pear 2
# 5 tulip pear 2
# 6 rose sunflower 3
# 7 kiwi sunflower 3
# 8 apple sunflower 3
# 9 bike orange 4
# 10 sunflower orange 4
# 11 lemon orange 4
# 12 orange car 5
# 13 tulip apple 6
# 14 pear apple 6
我还添加了一个索引,因此我们知道哪些值来自哪些原始行。现在只需进行一些简单的合并,并进行一些重命名。
funny_flat <- merge(funny_flat, category_df, by.x = "a", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_a"
funny_flat <- merge(funny_flat, category_df, by.x = "b", by.y = "member")
names(funny_flat)[names(funny_flat) == "group"] <- "group_b"
funny_flat
# b a index group_a group_b
# 1 apple pear 6 fruits fruits
# 2 apple tulip 6 flowers fruits
# 3 car orange 5 fruits vehicles
# 4 motorbike apple 1 fruits vehicles
# 5 motorbike car 1 vehicles vehicles
# 6 orange bike 4 vehicles fruits
# 7 orange lemon 4 fruits fruits
# 8 orange sunflower 4 flowers fruits
# 9 pear motorbike 2 vehicles fruits
# 10 pear banana 2 fruits fruits
# 11 pear tulip 2 flowers fruits
# 12 sunflower apple 3 fruits flowers
# 13 sunflower rose 3 flowers flowers
# 14 sunflower kiwi 3 fruits flowers
现在,我们将对您的原始目标进行编码:查找a
和b
共享类别的值。 c
将是a
的值,因此也只是重命名。
funny_matching <- funny_flat[funny_flat[["group_a"]] == funny_flat[["group_b"]], ]
names(funny_matching)[names(funny_flat) == "a"] <- "c"
funny_matching
# b c index group_a group_b
# 1 apple pear 6 fruits fruits
# 5 motorbike car 1 vehicles vehicles
# 7 orange lemon 4 fruits fruits
# 10 pear banana 2 fruits fruits
# 13 sunflower rose 3 flowers flowers
再次,使用之前的索引进行合并。
merge(
funnydata,
funny_matching[, c("c", "index")],
by = "index",
all.x = TRUE
)
# index a b c
# 1 1 apple, car motorbike car
# 2 2 motorbik.... pear banana
# 3 3 rose, ki.... sunflower rose
# 4 4 bike, su.... orange lemon
# 5 5 orange car <NA>
# 6 6 tulip, pear apple pear