将存储在列表中的变量转换为r中的字符向量列表

时间:2014-07-16 02:33:00

标签: r vector dataframe type-conversion element

我有一个源自非常大的数据集的数据子集。我已将此数据子集拆分为数据帧列表,以便每个case / id是列表中的单独元素。每个元素都以case / id命名。然后,我从每个dataframe元素中删除所有变量,只留下一个变量 - 称为“state”。目前它是7个级别的因素。

我试图将这个'state'元素列表转换为一个字符向量列表。下面的元素是列表中的第一个元素,包括行号(源自更大的原始数据集)。

[[1]]
        state
104246 active
104247   rest
104248 active
104249 active
.
.
.
104315 active
104316 active
104317   rest
104318   rest

我试图把它变成一个看起来像这样的字符向量:

[1] "active" "rest" "active" "active" ........... "active" "active" "rest" "rest"

看起来很简单。我尝试过这样的事情(其中'temp'是列表名称):

as.vector(as.matrix(temp))   

返回如下内容:

         [,1]  
    id1  List,1
    id2  List,1
    id3  List,1
    id4  List,1

当我从中看到每个元素时,它们基本上看起来仍处于长形态。

或者,我尝试直接转换为角色:

as.vector(as.character(temp))

但是,这回归并不是理想的格式(尽管如此,我想我可以通过破解将因子级数转换为单词...(注意在大数据集中,有7个级别的因子'状态' )

[1] "list(state = c(1, 4, 1, 1, 1, 1, 1, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 6, 1, 4, 4, 1, 1, 1, 4,     1, 1, 1, 6, 4, 1, 1, 1, 1, 1, 4, 4, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 1, 1, 1, 1, 4, 4, 1, 1, 1, 1,     1, 1, 1, 4, 4))"

我还尝试将变量'state'作为转换之前的字符变量的一个因素,但这没有帮助。

以下是可重现示例的数据。它仅包含“temp”列表中的两个元素:

temp<-list(structure(list(state = structure(c(1L, 4L, 1L, 1L, 1L, 1L, 
                                           1L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 1L, 
                                           6L, 1L, 4L, 4L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 6L, 4L, 1L, 1L, 1L, 
                                           1L, 1L, 4L, 4L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                           4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                           1L, 4L, 4L), .Label = c("active", "active2", "active3", "rest", "rest2", 
                                                                   "stop", "stop2"), class = "factor")), .Names = "state", row.names = 104246:104318, class = "data.frame"), 
        structure(list(state = structure(c(1L, 4L, 4L, 4L, 1L, 1L, 
                                           1L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 1L, 6L, 4L, 1L, 4L, 
                                           4L, 4L, 1L, 4L, 1L, 1L, 1L), .Label = c("active", "active2", 
                                                                                   "active3", "rest", "rest2", "stop", "stop2"), class = "factor")), .Names = "state", row.names = 950:977, class = "data.frame"))



str(temp)

3 个答案:

答案 0 :(得分:2)

这可能是使用rapply的好机会:

x <- rapply(temp, as.character, how = "replace")
str(x)
# List of 2
#  $ :List of 1
#   ..$ state: chr [1:73] "active" "rest" "active" "active" ...
#  $ :List of 1
#   ..$ state: chr [1:28] "active" "rest" "rest" "rest" ...

如果您想进一步展开它,那么您可以使用unlist(..., recursive = FALSE)

str(unlist(rapply(temp, as.character, how = "replace"), recursive=FALSE))
# List of 2
#  $ state: chr [1:73] "active" "rest" "active" "active" ...
#  $ state: chr [1:28] "active" "rest" "rest" "rest" ...

第二种方法可以提供与@ Vlo方法相同的结果,但效率只比调用unlist一次更有效。要了解它有多么不同,这里有更大list的基准:

x <- replicate(1000, temp)   ## A larger list

## Vlo's approach
fun1 <- function() {
  lapply(x, function(y) as.character(unlist(y, use.names = FALSE)))
} 

## My approach
fun2 <- function() {
  unlist(rapply(x, as.character, how = "replace"), 
         recursive=FALSE, use.names=FALSE)
} 

## Benchmarking
library(microbenchmark)
microbenchmark(fun1(), fun2(), times = 50)
# Unit: milliseconds
#    expr       min        lq    median        uq       max neval
#  fun1() 435.84992 475.17146 497.63325 533.68488 1570.6814    50
#  fun2()  50.90449  55.79023  63.85908  70.78956  111.0357    50

## Comparison of results
all.equal(fun1(), fun2(), check.attributes=FALSE)
# [1] TRUE

答案 1 :(得分:0)

试试这段代码

as.vector(unlist(temp[[1]]))

答案 2 :(得分:0)

L = lapply(temp, function(x) as.character(unlist(x)))向量只需L[[1]]L[[2]]