以一种通用的方式从R中的多个输入数据帧创建多个输出数据帧

时间:2016-12-22 07:21:42

标签: dplyr data.table

我有n个输入数据框,每个数据框都有一个TimeStamp列+ k个数值列。

我想将它们转换为k输出数据框,每个数据框都有一个TimeStamp列+ n个数值列,以便输出的数字列i dataframe j将包含输入数据框j的数字列i中的值(列索引排除TimeStamp列,这是第一列)和缺少的{{1} s应该填充NA。

这些数据框中的第一列始终是TimeStamp列(TimeStamp s重叠的位置),

输入数据帧中的行数不同(可能有不同的TimeStamp)。

例如,TimeStamp的每个数据框d1, d2都具有以下结构(n=2下面显示了一个示例数据框d1k=4可以是任意的,但对于每个数据帧都是相同的)并且每个都存储在单独的csv文件中:

k

现在我想要d1 <- structure(list(TimeStamp = structure(1:6, .Label = c("2016-12-20 10:17:20", "2016-12-20 10:19:20", "2016-12-20 10:19:40", "2016-12-20 10:20:00", "2016-12-20 10:20:20", "2016-12-20 10:20:40", "2016-12-20 10:21:00", "2016-12-20 10:21:20", "2016-12-20 10:21:40", "2016-12-20 10:22:00", "2016-12-20 10:22:20", "2016-12-20 10:22:40", "2016-12-20 10:23:00", "2016-12-20 10:23:20", "2016-12-20 10:23:40", "2016-12-20 10:24:00", "2016-12-20 10:24:20", "2016-12-20 10:24:40", "2016-12-20 10:25:00", "2016-12-20 10:25:20", "2016-12-20 10:25:40", "2016-12-20 10:26:00", "2016-12-20 10:26:20", "2016-12-20 10:26:40", "2016-12-20 10:27:00", "2016-12-20 10:27:20", "2016-12-20 10:27:40", "2016-12-20 10:28:00", "2016-12-20 10:28:20", "2016-12-20 10:28:40", "2016-12-20 10:29:00", "2016-12-20 10:29:20", "2016-12-20 10:29:40", "2016-12-20 10:30:00", "2016-12-20 10:30:20", "2016-12-20 10:30:40", "2016-12-20 10:31:00", "2016-12-20 10:31:20", "2016-12-20 10:31:40", "2016-12-20 10:32:00", "2016-12-20 10:32:20", "2016-12-20 10:32:40", "2016-12-20 10:33:00", "2016-12-20 10:33:20", "2016-12-20 10:33:40", "2016-12-20 10:34:00", "2016-12-20 10:34:20", "2016-12-20 10:34:40", "2016-12-20 10:35:00", "2016-12-20 10:35:20", "2016-12-20 10:35:40", "2016-12-20 10:36:00", "2016-12-20 10:37:00", "2016-12-20 10:37:20", "2016-12-20 10:37:40", "2016-12-20 10:38:00", "2016-12-20 10:38:20", "2016-12-20 10:40:40", "2016-12-20 10:41:20", "2016-12-20 10:41:40", "2016-12-20 10:44:20", "2016-12-20 10:44:40", "2016-12-20 10:46:00", "2016-12-20 10:49:40", "2016-12-20 10:50:00", "2016-12-20 10:50:20", "2016-12-20 10:55:00", "2016-12-20 10:56:00", "2016-12-20 10:57:20", "2016-12-20 10:59:20", "2016-12-20 10:59:40", "2016-12-20 11:00:20", "2016-12-20 11:01:20", "2016-12-20 11:05:40", "2016-12-20 11:06:00", "2016-12-20 11:07:20", "2016-12-20 11:08:20", "2016-12-20 11:08:40", "2016-12-20 11:11:40", "2016-12-20 11:12:00", "2016-12-20 11:14:20", "2016-12-20 11:14:40", "2016-12-20 11:15:00", "2016-12-20 11:15:20", "2016-12-20 11:15:40", "2016-12-20 11:16:00", "2016-12-20 11:16:20", "2016-12-20 11:18:20", "2016-12-20 11:18:40", "2016-12-20 11:19:00", "2016-12-20 11:19:20", "2016-12-20 11:19:40", "2016-12-20 11:21:20", "2016-12-20 11:21:40", "2016-12-20 11:22:20", "2016-12-20 11:22:40", "2016-12-20 11:23:00", "2016-12-20 11:23:20", "2016-12-20 11:25:00", "2016-12-20 11:25:20", "2016-12-20 11:26:00", "2016-12-20 11:26:40", "2016-12-20 11:27:00", "2016-12-20 11:27:20", "2016-12-20 11:27:40", "2016-12-20 11:28:00", "2016-12-20 11:28:20", "2016-12-20 11:28:40", "2016-12-20 11:34:40", "2016-12-20 11:36:20", "2016-12-20 11:36:40", "2016-12-20 11:41:00", "2016-12-20 11:41:20", "2016-12-20 11:42:20", "2016-12-20 11:42:40", "2016-12-20 11:46:40", "2016-12-20 11:47:00", "2016-12-20 11:47:20", "2016-12-20 11:47:40", "2016-12-20 11:48:00", "2016-12-20 11:48:20", "2016-12-20 11:48:40", "2016-12-20 11:54:00", "2016-12-20 11:54:20", "2016-12-20 11:57:40", "2016-12-20 12:00:00", "2016-12-20 12:00:40", "2016-12-20 12:01:00", "2016-12-20 12:01:20", "2016-12-20 12:01:40", "2016-12-20 12:02:20", "2016-12-20 12:02:40", "2016-12-20 12:03:00", "2016-12-20 12:03:20", "2016-12-20 12:03:40", "2016-12-20 12:07:00", "2016-12-20 12:07:20", "2016-12-20 12:07:40", "2016-12-20 12:08:00", "2016-12-20 12:08:20", "2016-12-20 12:10:20", "2016-12-20 12:10:40" ), class = "factor"), b1 = c(-76L, 0L, 0L, -76L, -80L, -81L), b2 = c(0L, -74L, -79L, -73L, -79L, -77L), b3 = c(0L, 0L, -88L, -88L, -91L, 0L), b4 = c(0L, 0L, 0L, -78L, -80L, -78L )), .Names = c("TimeStamp", "b1", "b2", "b3", "b4"), row.names = c(NA, 6L), class = "data.frame") head(d1) # TimeStamp b1 b2 b3 b4 #1 2016-12-20 10:17:20 -76 0 0 0 #2 2016-12-20 10:19:20 0 -74 0 0 #3 2016-12-20 10:19:40 0 -79 -88 0 #4 2016-12-20 10:20:00 -76 -73 -88 -78 #5 2016-12-20 10:20:20 -80 -79 -91 -80 #6 2016-12-20 10:20:40 -81 -77 0 -78 d2 <- structure(list(TimeStamp = structure(137:142, .Label = c("2016-12-20 10:17:20", "2016-12-20 10:19:20", "2016-12-20 10:19:40", "2016-12-20 10:20:00", "2016-12-20 10:20:20", "2016-12-20 10:20:40", "2016-12-20 10:21:00", "2016-12-20 10:21:20", "2016-12-20 10:21:40", "2016-12-20 10:22:00", "2016-12-20 10:22:20", "2016-12-20 10:22:40", "2016-12-20 10:23:00", "2016-12-20 10:23:20", "2016-12-20 10:23:40", "2016-12-20 10:24:00", "2016-12-20 10:24:20", "2016-12-20 10:24:40", "2016-12-20 10:25:00", "2016-12-20 10:25:20", "2016-12-20 10:25:40", "2016-12-20 10:26:00", "2016-12-20 10:26:20", "2016-12-20 10:26:40", "2016-12-20 10:27:00", "2016-12-20 10:27:20", "2016-12-20 10:27:40", "2016-12-20 10:28:00", "2016-12-20 10:28:20", "2016-12-20 10:28:40", "2016-12-20 10:29:00", "2016-12-20 10:29:20", "2016-12-20 10:29:40", "2016-12-20 10:30:00", "2016-12-20 10:30:20", "2016-12-20 10:30:40", "2016-12-20 10:31:00", "2016-12-20 10:31:20", "2016-12-20 10:31:40", "2016-12-20 10:32:00", "2016-12-20 10:32:20", "2016-12-20 10:32:40", "2016-12-20 10:33:00", "2016-12-20 10:33:20", "2016-12-20 10:33:40", "2016-12-20 10:34:00", "2016-12-20 10:34:20", "2016-12-20 10:34:40", "2016-12-20 10:35:00", "2016-12-20 10:35:20", "2016-12-20 10:35:40", "2016-12-20 10:36:00", "2016-12-20 10:37:00", "2016-12-20 10:37:20", "2016-12-20 10:37:40", "2016-12-20 10:38:00", "2016-12-20 10:38:20", "2016-12-20 10:40:40", "2016-12-20 10:41:20", "2016-12-20 10:41:40", "2016-12-20 10:44:20", "2016-12-20 10:44:40", "2016-12-20 10:46:00", "2016-12-20 10:49:40", "2016-12-20 10:50:00", "2016-12-20 10:50:20", "2016-12-20 10:55:00", "2016-12-20 10:56:00", "2016-12-20 10:57:20", "2016-12-20 10:59:20", "2016-12-20 10:59:40", "2016-12-20 11:00:20", "2016-12-20 11:01:20", "2016-12-20 11:05:40", "2016-12-20 11:06:00", "2016-12-20 11:07:20", "2016-12-20 11:08:20", "2016-12-20 11:08:40", "2016-12-20 11:11:40", "2016-12-20 11:12:00", "2016-12-20 11:14:20", "2016-12-20 11:14:40", "2016-12-20 11:15:00", "2016-12-20 11:15:20", "2016-12-20 11:15:40", "2016-12-20 11:16:00", "2016-12-20 11:16:20", "2016-12-20 11:18:20", "2016-12-20 11:18:40", "2016-12-20 11:19:00", "2016-12-20 11:19:20", "2016-12-20 11:19:40", "2016-12-20 11:21:20", "2016-12-20 11:21:40", "2016-12-20 11:22:20", "2016-12-20 11:22:40", "2016-12-20 11:23:00", "2016-12-20 11:23:20", "2016-12-20 11:25:00", "2016-12-20 11:25:20", "2016-12-20 11:26:00", "2016-12-20 11:26:40", "2016-12-20 11:27:00", "2016-12-20 11:27:20", "2016-12-20 11:27:40", "2016-12-20 11:28:00", "2016-12-20 11:28:20", "2016-12-20 11:28:40", "2016-12-20 11:34:40", "2016-12-20 11:36:20", "2016-12-20 11:36:40", "2016-12-20 11:41:00", "2016-12-20 11:41:20", "2016-12-20 11:42:20", "2016-12-20 11:42:40", "2016-12-20 11:46:40", "2016-12-20 11:47:00", "2016-12-20 11:47:20", "2016-12-20 11:47:40", "2016-12-20 11:48:00", "2016-12-20 11:48:20", "2016-12-20 11:48:40", "2016-12-20 11:54:00", "2016-12-20 11:54:20", "2016-12-20 11:57:40", "2016-12-20 12:00:00", "2016-12-20 12:00:40", "2016-12-20 12:01:00", "2016-12-20 12:01:20", "2016-12-20 12:01:40", "2016-12-20 12:02:20", "2016-12-20 12:02:40", "2016-12-20 12:03:00", "2016-12-20 12:03:20", "2016-12-20 12:03:40", "2016-12-20 12:07:00", "2016-12-20 12:07:20", "2016-12-20 12:07:40", "2016-12-20 12:08:00", "2016-12-20 12:08:20", "2016-12-20 12:10:20", "2016-12-20 12:10:40" ), class = "factor"), b1 = c(-76L, 0L, 0L, 0L, -82L, -74L), b2 = c(-87L, -76L, 0L, 0L, 0L, -69L), b3 = c(0L, 0L, -84L, -84L, 0L, -85L), b4 = c(-75L, 0L, 0L, 0L, 0L, 0L)), .Names = c("TimeStamp", "b1", "b2", "b3", "b4"), row.names = c(NA, 6L), class = "data.frame") head(d2) # TimeStamp b1 b2 b3 b4 # 1 2016-12-20 12:07:20 -76 -87 0 -75 # 2 2016-12-20 12:07:40 0 -76 0 0 # 3 2016-12-20 12:08:00 0 0 -84 0 # 4 2016-12-20 12:08:20 0 0 -84 0 # 5 2016-12-20 12:10:20 -82 0 0 0 # 6 2016-12-20 12:10:40 -74 -69 -85 0 个数据框,每个数据框都有k列(要保存为单独的csv文件)。例如,我想从上面的输入数据框n获得以下输出数据帧b1, b2, b3, b4(其中两个显示),如下所示:

d1, d2

给定示例中来自不同数据帧的时间戳是不相交的,但是来自不同数据帧的时间戳一般将重叠,在后一种情况下我们不需要填充NA(因为数值)将出席)。

最简单,最有效和最通用的方法是什么(使用 b1 # TimeStamp d1 d2 #2016-12-20 10:17:20 -76 NA #2016-12-20 10:19:20 0 NA #2016-12-20 10:19:40 0 NA #2016-12-20 10:20:00 -76 NA #2016-12-20 10:20:20 -80 NA #2016-12-20 10:20:40 -81 NA #2016-12-20 12:07:20 NA -76 #2016-12-20 12:07:40 NA 0 #2016-12-20 12:08:00 NA 0 #2016-12-20 12:08:20 NA 0 #2016-12-20 12:10:20 NA -82 #2016-12-20 12:10:40 NA -74 b2 # TimeStamp d1 d2 #2016-12-20 10:17:20 0 NA #2016-12-20 10:19:20 -74 NA #2016-12-20 10:19:40 -79 NA #2016-12-20 10:20:00 -73 NA #2016-12-20 10:20:20 -79 NA #2016-12-20 10:20:40 -77 NA #2016-12-20 12:07:20 NA -87 #2016-12-20 12:07:40 NA -76 #2016-12-20 12:08:00 NA 0 #2016-12-20 12:08:20 NA 0 #2016-12-20 12:10:20 NA 0 #2016-12-20 12:10:40 NA -69 ,最好没有循环)?我可以使常量base R / dplyr / tidyr / data.tablen以及数据帧任意大。

1 个答案:

答案 0 :(得分:1)

也许你可以试试这个:

#read d1 data from PATH1
d1_df <- read.table("PATH1", header = T, sep = "\t", stringsAsFactors = F)
#store d1 colnames
d1_colname <- colnames(d1_df)[-1]
#read d2 data from PATH2
d2_df <- read.table("PATH2", header = T, sep = "\t", stringsAsFactors = F)
#store d2 colnames
d2_colname <- colnames(d2_df)[-1]
#merge two df timestamp
TimeStamp <-c(unlist(d1[,1]), unlist(d2[,1]))
#merge two df colname
merge_colname <- rbind(d1_colname, d2_colname)
#to match the format want
merge_df <- function(vec_colname){
  d1 <- c(unlist(d1_df[, vec_colname[1]]), rep("NA", nrow(d2_df)))
  d2 <- c(rep("NA", nrow(d1_df)), unlist(d2_df[, vec_colname[2]]))
  return( data.frame(TimeStamp, d1, d2))
}
#get result,but is a list
res_list <- apply(merge_colname, 2, merge_df)
#create data frames from the result
for(i in 1:length(res_list)){
  #bi <- res_list[[i]]
  eval(parse(text=paste0("b",i,"<-res_list[[",i,"]]")))
}

结果:

> b1
             TimeStamp  d1  d2
1  2016-12-20 10:17:20 -76  NA
2  2016-12-20 10:19:20   0  NA
3  2016-12-20 10:19:40   0  NA
4  2016-12-20 10:20:00 -76  NA
5  2016-12-20 10:20:20 -80  NA
6  2016-12-20 10:20:40 -81  NA
7  2016-12-20 12:07:20  NA -76
8  2016-12-20 12:07:40  NA   0
9  2016-12-20 12:08:00  NA   0
10 2016-12-20 12:08:20  NA   0
11 2016-12-20 12:10:20  NA -82
12 2016-12-20 12:10:40  NA -74