我有一个命名元素列表(testlist
),其中一些名称是重复的
$x
[1] "one"
$x
[1] "two"
$y
[1] "three"
$y
[1] "four"
我试图最终得到一个数据表,它将元素与通用名称组合到同一列中。
x y
1: one three
2: two four
我试过了
testdf <- do.call(cbind, lapply(testlist, data.table))
但最终只能:
x.V1 x.V1 y.V1 y.V1
1: one two three four
有什么建议吗?感谢帮助!
答案 0 :(得分:8)
尝试
library(data.table)#v1.9.5+
dcast(setDT(stack(testlist))[, N:= 1:.N, ind],
N~ind, value.var='values')[,N:=NULL][]
# x y
#1: one three
#2: two four
或base R
方法
unstack(stack(testlist),values~ind)
# x y
#1 one three
#2 two four
答案 1 :(得分:6)
更有效的基础R替代方案可能是:
data.frame(split(unlist(L, use.names = FALSE), names(L)))
# x y
# 1 one three
# 2 two four
示例数据:
L <- as.list(setNames(c("one", "two", "three", "four"), c("x", "x", "y", "y")))
此外,在&#34; data.table&#34;中,手动创建data.table
而不是使用stack
会更有效:
library(data.table) # V1.9.4
dcast.data.table(
data.table(val = unlist(L, use.names = FALSE), var = names(L))[
, rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
# Required packages
library(stringi)
library(microbenchmark)
library(data.table)
# Sample data
set.seed(1) # for reproducible data
nr = 10000 # final number of rows expected
nc = 100 # final number of columns expected
L <- as.list(setNames(sample(100, nc*nr, TRUE), rep(stri_rand_strings(nc, 7), nr)))
# Functions to benchmark
funak_b <- function() unstack(stack(L),values~ind)
funak_dt <- function() {
dcast.data.table(setDT(stack(L))[, N:= 1:.N, ind],
N ~ ind, value.var = 'values')[, N := NULL][]
}
funam_b <- function() data.frame(split(unlist(L, use.names = FALSE), names(L)))
funam_dt <- function() {
dcast.data.table(
data.table(val = unlist(L, use.names = FALSE), var = names(L))[
, rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
}
# Results
microbenchmark(funak_b(), funak_dt(), funam_b(), funam_dt(), times = 20)
# Unit: milliseconds
# expr min lq mean median uq max neval
# funak_b() 2171.53485 2292.55003 2434.8899 2463.1977 2546.4671 2687.5924 20
# funak_dt() 2364.68148 2598.00309 2646.6790 2643.5328 2694.8609 2902.6150 20
# funam_b() 91.88414 93.09794 104.0179 96.4256 100.4168 204.0342 20
# funam_dt() 238.17656 249.59135 344.9249 310.8694 423.6861 508.1844 20
我想我会坚持使用基础R: - )