我有6个数据框,所有数据框都有唯一的列名,相同数量的列,并且数据是在同一时间段内收集的。
每个数据框都有一个时间戳和分钟平均值,但有些数据帧缺少数据和列长度不相等。
我想合并数据帧以并排显示所有6个数据帧,但有时仅在所有6个数据帧中存在数据,即具有最少列数的df,即" H1_min"
> head(H1_min)
h1min h1temp h1humid h1db h1hz
1 2015-09-06 00:00:00 21.5 73.10 39.252 117.1900
2 2015-09-06 00:02:00 21.5 72.50 39.434 125.0000
3 2015-09-06 00:03:00 21.5 72.65 39.338 127.9325
4 2015-09-06 00:04:00 21.5 73.00 39.206 148.4400
5 2015-09-06 00:06:00 21.5 73.00 39.253 144.5350
6 2015-09-06 00:07:00 21.5 72.30 39.293 156.2500
其他数据帧的类名相似,但H1 = H2到H6。
dput(head(H2_min))
"2015-09-08 20:21:00", "2015-09-08 20:22:00", "2015-09-08 20:23:00",
"2015-09-08 20:24:00", "2015-09-08 20:25:00", "2015-09-08 20:26:00",
"2015-09-08 20:27:00", "2015-09-08 20:28:00", "2015-09-08 20:29:00",
"2015-09-08 20:30:00", "2015-09-08 20:31:00", "2015-09-08 20:32:00",
"2015-09-08 20:33:00", "2015-09-08 20:34:00", "2015-09-08 20:35:00"
), class = "factor"), h2temp = c(23.4, 23.4, 23.3, 23.2, 23.2,
23.1), h2humid = c(38.5, 38.3, 38.05, 38.1, 38.6, 38.6), h2db = c(38.834,
38.655, 38.679, 38.695, 38.806, 38.702), h2hz = c(191.41, 152.34,
162.11, 113.28, 121.09, 164.06)), .Names = c("h2min", "h2temp",
"h2humid", "h2db", "h2hz"), row.names = c(NA, 6L), class = "data.frame")
dput(head(H4_min))
"2015-09-08 17:10:00", "2015-09-08 17:11:00", "2015-09-08 17:12:00",
"2015-09-08 17:13:00"), class = "factor"), h4temp = c(27.2, 27.2,
27.2, 27.2, 27.2, 27.2), h4humid = c(33.5, 33.5, 33.5, 33.5,
33.5, 33.5), h4db = c(36.8225, 36.921, 36.8766666666667, 36.91,
36.8336666666667, 36.768), h4hz = c(134.765, 136.068333333333,
137.373333333333, 126.3, 139.323333333333, 128.906666666667)), .Names =
c("h4min", "h4temp", "h4humid", "h4db", "h4hz"), row.names = c(NA, 6L), class = "data.frame")
这种尝试产生:
H_min<-merge(H1_min, H2_min, H3_min, H4_min, H5_min, H6_min, by.x = 'row.names', by.y ='h1_min')
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
答案 0 :(得分:2)
另一种方法是将data.frames转换为xts对象,然后使用int sum = 0;
for (int i = 1; i <= N; i++)
for (int j = 1; j <= i*i; j++)
for (int k = 1; k <= j*j; k++)
sum++;
,它会根据时间戳自动合并,然后将结果转换回data.frame。
以下大多数代码只是为了创建可重现的样本数据。实际工作最后是6行。
merge.xts(...)
答案 1 :(得分:0)
library(dplyr)
library(magrittr)
library(tidyr)
H1_min =
data_frame(
h1min = c("2015-09-06 00:00:00", "2015-09-06 00:02:00"),
h1temp = c(21.5, 21.5),
h1humid = c(73.10, 72.50),
h1db = c(39.252, 39.434),
h1hz = c(117.1900, 125.000) )
H2_min = H1_min %>% mutate(h1hz = c(117.1900, NA))
answer =
list(H1_min, H2_min) %>%
lapply(. %>% setNames(c("min",
"temp",
"humid",
"db",
"hz"))) %>%
bind_rows(.id = "location") %>%
gather(variable, value, -location, -min) %>%
mutate(prefix = "h") %>%
unite(new_variable, prefix, location, variable, sep = "") %>%
spread(new_variable, value) %>%
filter(complete.cases(.))
答案 2 :(得分:0)
基于@jlhoward答案解决此问题的简单方法。
qxts1 <- xts(df1[,-1], order.by = df1[,1])
qxts2 <- xts(df2[,-1], order.by = df2[,1])
xts.lst = list(qxts1, qxts2)
result <- do.call(merge.xts, c(xts.lst, all=FALSE))
result <- data.frame(result)
对于xts或zoo,请确保您的TimeStamp是矢量或矩阵,其中包含Date,POSIXct,chron等...