我有这个数据框:
DATE pc SERVER
1 2013-02-16 01:00:00 3.83 server1
2 2013-02-16 02:00:00 3.45 server1
3 2013-02-16 03:00:00 3.34 server1
4 2013-02-16 04:00:00 3.73 server1
5 2013-02-16 05:00:00 3.16 server1
6 2013-02-16 06:00:00 3.16 server1
7 2013-02-16 01:00:00 4.74 server2
8 2013-02-16 02:00:00 5.70 server2
9 2013-02-16 03:00:00 8.54 server2
10 2013-02-16 04:00:00 9.25 server2
11 2013-02-16 05:00:00 10.12 server2
12 2013-02-16 06:00:00 10.15 server2
在SERVER列上有8个服务器。我需要在DATE对每个服务器进行分组。例如,
这就是我需要这个df看的东西;
DATE server1 server2
2013-02-16 01:00:00 3.83 4.74
2013-02-16 02:00:00 3.45 5.50
2013-02-16 03:00:00 3.34 8.54
2013-02-16 04:00:00 3.73 9.25
等
我该如何做,重新组织我的数据框
答案 0 :(得分:3)
这是一个非常基本的reshape
问题。假设您的data.frame
被称为“mydf”:
> reshape(mydf, direction = "wide", idvar="DATE", timevar="SERVER")
DATE pc.server1 pc.server2
1 2013-02-16 01:00:00 3.83 4.74
2 2013-02-16 02:00:00 3.45 5.70
3 2013-02-16 03:00:00 3.34 8.54
4 2013-02-16 04:00:00 3.73 9.25
5 2013-02-16 05:00:00 3.16 10.12
6 2013-02-16 06:00:00 3.16 10.15
或者,使用“reshape2”包:
> library(reshape2)
> dcast(mydf, DATE ~ SERVER, value.var="pc")
DATE server1 server2
1 2013-02-16 01:00:00 3.83 4.74
2 2013-02-16 02:00:00 3.45 5.70
3 2013-02-16 03:00:00 3.34 8.54
4 2013-02-16 04:00:00 3.73 9.25
5 2013-02-16 05:00:00 3.16 10.12
6 2013-02-16 06:00:00 3.16 10.15
如果您有“DATE”和“SERVER”的重复组合,则需要在数据中添加辅助“ID”变量。
以下是一些示例数据(请在将来以此格式分享您的数据):
mydf <- structure(list(DATE = c("2013-02-16 01:00:00", "2013-02-16 02:00:00",
"2013-02-16 03:00:00", "2013-02-16 04:00:00", "2013-02-16 05:00:00",
"2013-02-16 06:00:00", "2013-02-16 01:00:00", "2013-02-16 02:00:00",
"2013-02-16 03:00:00", "2013-02-16 04:00:00", "2013-02-16 05:00:00",
"2013-02-16 06:00:00", "2013-02-16 01:00:00"), pc = c(3.83, 3.45,
3.34, 3.73, 3.16, 3.16, 4.74, 5.7, 8.54, 9.25, 10.12, 10.15,
5.83), SERVER = c("server1", "server1", "server1", "server1",
"server1", "server1", "server2", "server2", "server2", "server2",
"server2", "server2", "server1")), .Names = c("DATE", "pc", "SERVER"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13"), class = "data.frame")
mydf
# DATE pc SERVER
# 1 2013-02-16 01:00:00 3.83 server1
# 2 2013-02-16 02:00:00 3.45 server1
# 3 2013-02-16 03:00:00 3.34 server1
# 4 2013-02-16 04:00:00 3.73 server1
# 5 2013-02-16 05:00:00 3.16 server1
# 6 2013-02-16 06:00:00 3.16 server1
# 7 2013-02-16 01:00:00 4.74 server2
# 8 2013-02-16 02:00:00 5.70 server2
# 9 2013-02-16 03:00:00 8.54 server2
# 10 2013-02-16 04:00:00 9.25 server2
# 11 2013-02-16 05:00:00 10.12 server2
# 12 2013-02-16 06:00:00 10.15 server2
# 13 2013-02-16 01:00:00 5.83 server1
请注意,由于第1行和第13行中存在重复的“DATE”+“SERVER”组合,如果没有收到您提到的警告,我们将无法使用reshape
。解决方案:添加辅助ID :
mydf$ID <- ave(as.character(mydf$DATE), mydf$DATE, mydf$SERVER, FUN = seq_along)
reshape(mydf, direction = "wide", idvar=c("DATE", "ID"), timevar="SERVER")
# DATE ID pc.server1 pc.server2
# 1 2013-02-16 01:00:00 1 3.83 4.74
# 2 2013-02-16 02:00:00 1 3.45 5.70
# 3 2013-02-16 03:00:00 1 3.34 8.54
# 4 2013-02-16 04:00:00 1 3.73 9.25
# 5 2013-02-16 05:00:00 1 3.16 10.12
# 6 2013-02-16 06:00:00 1 3.16 10.15
# 13 2013-02-16 01:00:00 2 5.83 NA
答案 1 :(得分:0)
使用reshape
包,您可以这样做。考虑数据框df
:
df = data.frame(DATE = c("2013-02-16", "2013-02-17", "2013-02-18", "2013-02-16", "2013-02-17", "2013-02-18"), SERVER = c("server1","server1","server1","server2","server2","server2"), pc = c(1,2,3,4,5,6))
cast(df, DATE ~ SERVER, value = 'pc', mean)
你得到:
DATE server1 server2
1 2013-02-16 1 4
2 2013-02-17 2 5
3 2013-02-18 3 6