我有一个像这样的嵌套列表:
> ex <- list(list(c("This", "is", "an", "example", "."), c("I", "really", "hate", "examples", ".")), list(c("How", "do", "you", "feel", "about", "examples", "?")))
> ex
[[1]]
[[1]][[1]]
[1] "This" "is" "an" "example" "."
[[1]][[2]]
[1] "I" "really" "hate" "examples" "."
[[2]]
[[2]][[1]]
[1] "How" "do" "you" "feel" "about" "examples" "?"
我想把它转换成像这样的元素:
> tibble(d_id = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)),
+ s_id = as.integer(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1)),
+ t_id = as.integer(c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7)),
+ token = c("This", "is", "an", "example", ".", "I", "really",
+ "hate", "examples", ".", "How", "do", "you", "feel", "about", "examples", "?"))
# A tibble: 17 x 4
d_id s_id t_id token
<int> <int> <int> <chr>
1 1 1 1 This
2 1 1 2 is
3 1 1 3 an
4 1 1 4 example
5 1 1 5 .
6 1 2 1 I
7 1 2 2 really
8 1 2 3 hate
9 1 2 4 examples
10 1 2 5 .
11 2 1 1 How
12 2 1 2 do
13 2 1 3 you
14 2 1 4 feel
15 2 1 5 about
16 2 1 6 examples
17 2 1 7 ?
我执行此操作的最有效方法是什么?最好使用tidyverse
功能?
答案 0 :(得分:5)
我们可以做到
ex %>%
set_names(seq_along(ex)) %>%
map( ~ set_names(.x, seq_along(.x)) %>%
stack) %>%
bind_rows(.id = 'd_id') %>%
group_by(d_id, s_id = ind) %>%
mutate(t_id = row_number()) %>%
select(d_id, s_id, t_id, token = values)
# A tibble: 17 x 4
# Groups: d_id, s_id [3]
# d_id s_id t_id token
# <chr> <chr> <int> <chr>
# 1 1 1 1 This
# 2 1 1 2 is
# 3 1 1 3 an
# 4 1 1 4 example
# 5 1 1 5 .
# 6 1 2 1 I
# 7 1 2 2 really
# 8 1 2 3 hate
# 9 1 2 4 examples
#10 1 2 5 .
#11 2 1 1 How
#12 2 1 2 do
#13 2 1 3 you
#14 2 1 4 feel
#15 2 1 5 about
#16 2 1 6 examples
#17 2 1 7 ?
答案 1 :(得分:5)
时间让一些序列工作,这应该是非常有效的:
d_id <- rep(seq_along(ex), lengths(ex))
s_id <- sequence(lengths(ex))
t_id <- lengths(unlist(ex, rec=FALSE))
data.frame(
d_id = rep(d_id, t_id),
s_id = rep(s_id, t_id),
t_id = sequence(t_id),
token = unlist(ex)
)
# d_id s_id t_id token
#1 1 1 1 This
#2 1 1 2 is
#3 1 1 3 an
#4 1 1 4 example
#5 1 1 5 .
#6 1 2 1 I
#7 1 2 2 really
#8 1 2 3 hate
#9 1 2 4 examples
#10 1 2 5 .
#11 2 1 1 How
#12 2 1 2 do
#13 2 1 3 you
#14 2 1 4 feel
#15 2 1 5 about
#16 2 1 6 examples
#17 2 1 7 ?
对于ex
列表的500K样本,这将在大约2秒内运行。我怀疑在效率方面很难被击败。
答案 2 :(得分:4)
您可以使用reshape2包中的melt
:
library(data.table)
setDT(melt(ex))[, .(d_id = L1, s_id = L2, t_id = rowid(L1, L2), token = value)]
d_id s_id t_id token
1: 1 1 1 This
2: 1 1 2 is
3: 1 1 3 an
4: 1 1 4 example
5: 1 1 5 .
6: 1 2 1 I
7: 1 2 2 really
8: 1 2 3 hate
9: 1 2 4 examples
10: 1 2 5 .
11: 2 1 1 How
12: 2 1 2 do
13: 2 1 3 you
14: 2 1 4 feel
15: 2 1 5 about
16: 2 1 6 examples
17: 2 1 7 ?
我在这里用data.table显示它,因为我知道如何从那里一步完成列选择和重命名(尽管dplyr应该没有问题)。 melt.list
函数来自reshape2。
答案 3 :(得分:1)
另一个tidyverse
解决方案:
library(tidyverse)
ex %>%
modify_depth(-1,~tibble(token=.x) %>% rowid_to_column("t_id")) %>%
map(~map_dfr(.x,identity,.id = "s_id")) %>%
map_dfr(identity,.id = "d_id")
# # A tibble: 17 x 4
# d_id s_id t_id token
# <chr> <chr> <int> <chr>
# 1 1 1 1 This
# 2 1 1 2 is
# 3 1 1 3 an
# 4 1 1 4 example
# 5 1 1 5 .
# 6 1 2 1 I
# 7 1 2 2 really
# 8 1 2 3 hate
# 9 1 2 4 examples
# 10 1 2 5 .
# 11 2 1 1 How
# 12 2 1 2 do
# 13 2 1 3 you
# 14 2 1 4 feel
# 15 2 1 5 about
# 16 2 1 6 examples
# 17 2 1 7 ?