聚合相邻的行,忽略某些列

时间:2017-03-07 06:57:52

标签: r aggregate reshape

我有一个如下的df

> head(df)
  OrderId           Timestamp ErrorCode
1 3000000 1455594300434609920        NA
2 3000001 1455594300434614272        NA
3 3000000 1455594300440175104         0
4 3000001 1455594300440179712         0
5 3000002 1455594303468741120        NA
6 3000002 1455594303469326848         0

我需要以输出类似于

的方式折叠行
> head(df)
  OrderId         Timestamp1  Timestamp2       ErrorCode Diff
 3000000 1455594300434609920  1455594300440175104      0
 3000001 1455594300434614272  1455594300440179712      0
 3000002 1455594303468741120  1455594303469326848      0

我使用了df2=aggregate(Timestamp~.,df,FUN=toString) 但输出是

   OrderId ErrorCode           Timestamp
10 3000001         0 1455594300440179712
11 3000002         0 1455594303469326848
12 3000003         0 1455594303713897984

当我删除ErrorCode列并使用相同的命令时,我得到了预期的输出

> head(kf)
  OrderId           Timestamp
1 3000000 1455594300434609920
2 3000001 1455594300434614272
3 3000000 1455594300440175104
4 3000001 1455594300440179712
5 3000002 1455594303468741120
6 3000002 1455594303469326848
> kf2=aggregate(Timestamp~.,kf,FUN=toString)
head(kf2)
   OrderId                                Timestamp
10 3000001 1455594300434614272, 1455594300440179712
11 3000002 1455594303468741120, 1455594303469326848
12 3000003 1455594303711330816, 1455594303713897984

如何在不删除ErrorCode列的情况下以上述方式聚合它。必须有一些我想念的小事。

1 个答案:

答案 0 :(得分:0)

我认为你实际上只想将数据重新整形为宽格式,并为时间戳1和2添加单独的列。一种方法是首先添加一个新列,用于定义测量的时间点,然后使用reshape2融合并投射数据。

# Add an index to the data.frame
for (i in unique(df$OrderId)) {
  ii <- df$OrderId == i
  df$time_ind[ii] <- seq_along(ii[ii])
}

library(reshape2)

df_long <- melt(df, id.vars = c("OrderId", "time_ind"),
                measure.vars = c("Timestamp", "ErrorCode"))

dcast(df_long, OrderId ~ variable + time_ind)

会给你

  OrderId         Timestamp_1         Timestamp_2 ErrorCode_1 ErrorCode_2
1 3000000 1455594300434609920 1455594300440175104        <NA>           0
2 3000001 1455594300434614272 1455594300440179712        <NA>           0
3 3000002 1455594303468741120 1455594303469326848        <NA>           0