我有一个如下的df
> head(df)
OrderId Timestamp ErrorCode
1 3000000 1455594300434609920 NA
2 3000001 1455594300434614272 NA
3 3000000 1455594300440175104 0
4 3000001 1455594300440179712 0
5 3000002 1455594303468741120 NA
6 3000002 1455594303469326848 0
我需要以输出类似于
的方式折叠行> head(df)
OrderId Timestamp1 Timestamp2 ErrorCode Diff
3000000 1455594300434609920 1455594300440175104 0
3000001 1455594300434614272 1455594300440179712 0
3000002 1455594303468741120 1455594303469326848 0
我使用了df2=aggregate(Timestamp~.,df,FUN=toString)
但输出是
OrderId ErrorCode Timestamp
10 3000001 0 1455594300440179712
11 3000002 0 1455594303469326848
12 3000003 0 1455594303713897984
当我删除ErrorCode列并使用相同的命令时,我得到了预期的输出
> head(kf)
OrderId Timestamp
1 3000000 1455594300434609920
2 3000001 1455594300434614272
3 3000000 1455594300440175104
4 3000001 1455594300440179712
5 3000002 1455594303468741120
6 3000002 1455594303469326848
> kf2=aggregate(Timestamp~.,kf,FUN=toString)
head(kf2)
OrderId Timestamp
10 3000001 1455594300434614272, 1455594300440179712
11 3000002 1455594303468741120, 1455594303469326848
12 3000003 1455594303711330816, 1455594303713897984
如何在不删除ErrorCode列的情况下以上述方式聚合它。必须有一些我想念的小事。
答案 0 :(得分:0)
我认为你实际上只想将数据重新整形为宽格式,并为时间戳1和2添加单独的列。一种方法是首先添加一个新列,用于定义测量的时间点,然后使用reshape2
融合并投射数据。
# Add an index to the data.frame
for (i in unique(df$OrderId)) {
ii <- df$OrderId == i
df$time_ind[ii] <- seq_along(ii[ii])
}
library(reshape2)
df_long <- melt(df, id.vars = c("OrderId", "time_ind"),
measure.vars = c("Timestamp", "ErrorCode"))
dcast(df_long, OrderId ~ variable + time_ind)
会给你
OrderId Timestamp_1 Timestamp_2 ErrorCode_1 ErrorCode_2
1 3000000 1455594300434609920 1455594300440175104 <NA> 0
2 3000001 1455594300434614272 1455594300440179712 <NA> 0
3 3000002 1455594303468741120 1455594303469326848 <NA> 0