将命名元素列表转换为数据框或数据表

时间:2015-07-07 15:03:55

标签: r dataframe

我有一个命名元素列表(testlist),其中一些名称是重复的

$x
[1] "one"

$x
[1] "two"

$y
[1] "three"

$y
[1] "four"

我试图最终得到一个数据表,它将元素与通用名称组合到同一列中。

     x     y
1: one three
2: two  four

我试过了

testdf <- do.call(cbind, lapply(testlist, data.table))

但最终只能:

   x.V1 x.V1  y.V1 y.V1
1:  one  two three four

有什么建议吗?感谢帮助!

2 个答案:

答案 0 :(得分:8)

尝试

library(data.table)#v1.9.5+
dcast(setDT(stack(testlist))[, N:= 1:.N, ind],
                  N~ind, value.var='values')[,N:=NULL][]
#    x     y
#1: one three
#2: two  four

base R方法

unstack(stack(testlist),values~ind)
#   x     y
#1 one three
#2 two  four

答案 1 :(得分:6)

更有效的基础R替代方案可能是:

data.frame(split(unlist(L, use.names = FALSE), names(L)))
#     x     y
# 1 one three
# 2 two  four

示例数据:

L <- as.list(setNames(c("one", "two", "three", "four"), c("x", "x", "y", "y")))

此外,在&#34; data.table&#34;中,手动创建data.table而不是使用stack会更有效:

library(data.table) # V1.9.4
dcast.data.table(
  data.table(val = unlist(L, use.names = FALSE), var = names(L))[
    , rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
# Required packages
library(stringi)
library(microbenchmark)
library(data.table)

# Sample data
set.seed(1)   # for reproducible data
nr = 10000    # final number of rows expected
nc = 100      # final number of columns expected
L <- as.list(setNames(sample(100, nc*nr, TRUE), rep(stri_rand_strings(nc, 7), nr)))

# Functions to benchmark
funak_b <- function() unstack(stack(L),values~ind)
funak_dt <- function() {
  dcast.data.table(setDT(stack(L))[, N:= 1:.N, ind],
                   N ~ ind, value.var = 'values')[, N := NULL][]
}
funam_b <- function() data.frame(split(unlist(L, use.names = FALSE), names(L)))
funam_dt <- function() {
  dcast.data.table(
    data.table(val = unlist(L, use.names = FALSE), var = names(L))[
      , rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
}

# Results
microbenchmark(funak_b(), funak_dt(), funam_b(), funam_dt(), times = 20)
# Unit: milliseconds
#        expr        min         lq      mean    median        uq       max neval
#   funak_b() 2171.53485 2292.55003 2434.8899 2463.1977 2546.4671 2687.5924    20
#  funak_dt() 2364.68148 2598.00309 2646.6790 2643.5328 2694.8609 2902.6150    20
#   funam_b()   91.88414   93.09794  104.0179   96.4256  100.4168  204.0342    20
#  funam_dt()  238.17656  249.59135  344.9249  310.8694  423.6861  508.1844    20

我想我会坚持使用基础R: - )