合并具有不同行数的列表成员

时间:2014-05-14 15:02:07

标签: r merge data.table cbind

以下是我可以在您的控制台中运行的列表(请告诉我,如果它出于示例目的而过长,我可以修改它):

my_list = list(structure(list(PX_LAST = c(0.398, 0.457, 0.4, 0.159, 0.126, 
0.108, 0.26, 0.239, 0.222, 0.191, 0.184)), .Names = "PX_LAST", row.names = c("2014-04-28 00:00:00", 
"2014-04-29 00:00:00", "2014-04-30 00:00:00", "2014-05-02 00:00:00", 
"2014-05-05 00:00:00", "2014-05-06 00:00:00", "2014-05-07 00:00:00", 
"2014-05-08 00:00:00", "2014-05-09 00:00:00", "2014-05-12 00:00:00", 
"2014-05-13 00:00:00"), class = "data.frame"), structure(list(
    PX_LAST = c(1.731, 1.706, 1.7095, 1.69, 1.713, 1.711, 1.724, 
    1.699, 1.702, 1.705, 1.649, 1.611)), .Names = "PX_LAST", row.names = c("2014-04-29 00:00:00", 
"2014-04-30 00:00:00", "2014-05-01 00:00:00", "2014-05-02 00:00:00", 
"2014-05-05 00:00:00", "2014-05-06 00:00:00", "2014-05-07 00:00:00", 
"2014-05-08 00:00:00", "2014-05-09 00:00:00", "2014-05-12 00:00:00", 
"2014-05-13 00:00:00", "2014-05-14 00:00:00"), class = "data.frame"), 
    structure(list(PX_LAST = c(0.481, 0.456, 0.448, 0.439, 0.436, 
    0.448, 0.458, 0.466, 0.432, 0.437, 0.441, 0.417, 0.4035)), .Names = "PX_LAST", row.names = c("2014-04-28 00:00:00", 
    "2014-04-29 00:00:00", "2014-04-30 00:00:00", "2014-05-01 00:00:00", 
    "2014-05-02 00:00:00", "2014-05-05 00:00:00", "2014-05-06 00:00:00", 
    "2014-05-07 00:00:00", "2014-05-08 00:00:00", "2014-05-09 00:00:00", 
    "2014-05-12 00:00:00", "2014-05-13 00:00:00", "2014-05-14 00:00:00"
    ), class = "data.frame"), structure(list(PX_LAST = c(1.65, 
    1.65, 1.64, 1.65, 1.662, 1.6595, 1.665, 1.6595, 1.6625, 1.652, 
    1.645, 1.6245, 1.627, 1.633)), .Names = "PX_LAST", row.names = c("2014-04-25 00:00:00", 
    "2014-04-28 00:00:00", "2014-04-29 00:00:00", "2014-04-30 00:00:00", 
    "2014-05-01 00:00:00", "2014-05-02 00:00:00", "2014-05-05 00:00:00", 
    "2014-05-06 00:00:00", "2014-05-07 00:00:00", "2014-05-08 00:00:00", 
    "2014-05-09 00:00:00", "2014-05-12 00:00:00", "2014-05-13 00:00:00", 
    "2014-05-14 00:00:00"), class = "data.frame"))

我的问题是:如何在该列表中使用do.call()根据日期合并所有数据?

考虑我无法管理的mergecbind返回错误:

> do.call(what = merge, args = my_list)
Error in fix.by(by.x, x) : 
'by' must specify column(s) as numbers, names or logical

> do.call(what = cbind, args = my_list)
Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 11, 12, 13, 14

我想得到一个单一的数据矩阵(可能缺少/不匹配的数据被NA替换)等于merge()对{{1}的元素的影响}}

2 个答案:

答案 0 :(得分:2)

如果您没有按行名称合并,这会更容易一些,但您可以使用Reduce函数执行此操作,该函数将按顺序在值列表中应用函数(在本例中为data.frames`尝试

Reduce(function(x,y) {
    dd<-merge(x,y,by=0); rownames(dd)<-dd$Row.names; dd[-1]
}, my_list)

这将合并所有匹配的行。如果您愿意,也可以将all=T添加到匹配项中,或者如果您使用常规merge(),则可以自定义。

您会收到有关列名称的警告,因为您的每个列都具有相同的名称,因此当您合并到多个列时,merge并不知道您为它们命名的内容。你可以用

之类的东西重命名它们
my_new_list <- Map(
    function(x,n) {
        names(x)<-n; x
    }, 
    my_list, 
    paste("PX_LAST",1:length(my_list), sep="_")
)

然后

 Reduce(function(x,y) {
    dd<-merge(x,y,by=0); rownames(dd)<-dd$Row.names; dd[-1]
}, my_new_list)

不会抱怨。

答案 1 :(得分:1)

以下是使用data.tablereshape2的解决方案:

# Load libraries
library(data.table)
library(reshape2)

# Setup new list object 
my_list.2 <- vector(length(my_list), mode="list")

# Add time stamps as variable and add ID variable
for(i in 1:length(my_list)){ 
  my_list.2[[i]] <- cbind(time=rownames(my_list[[i]]), my_list[[i]], id=rep(paste0("list_",i), id=nrow(my_list[[i]]))) 
}

# Collapse all lists in one data table
d.temp <- rbindlist(my_list.2)

# Transform the data
d.final <- dcast(time~id, value.var="PX_LAST", data=d.temp)


# > d.final
#                   time list_1 list_2 list_3 list_4
# 1  2014-04-28 00:00:00  0.398     NA 0.4810 1.6500
# 2  2014-04-29 00:00:00  0.457 1.7310 0.4560 1.6400
# 3  2014-04-30 00:00:00  0.400 1.7060 0.4480 1.6500
# 4  2014-05-02 00:00:00  0.159 1.6900 0.4360 1.6595
# 5  2014-05-05 00:00:00  0.126 1.7130 0.4480 1.6650
# 6  2014-05-06 00:00:00  0.108 1.7110 0.4580 1.6595
# 7  2014-05-07 00:00:00  0.260 1.7240 0.4660 1.6625
# 8  2014-05-08 00:00:00  0.239 1.6990 0.4320 1.6520
# 9  2014-05-09 00:00:00  0.222 1.7020 0.4370 1.6450
# 10 2014-05-12 00:00:00  0.191 1.7050 0.4410 1.6245
# 11 2014-05-13 00:00:00  0.184 1.6490 0.4170 1.6270
# 12 2014-05-01 00:00:00     NA 1.7095 0.4390 1.6620
# 13 2014-05-14 00:00:00     NA 1.6110 0.4035 1.6330
# 14 2014-04-25 00:00:00     NA     NA     NA 1.6500