如何在数据框中选择唯一值

时间:2013-11-12 18:03:28

标签: r dataframe data.table

我有小时数据,我希望从小时数据中获取最大数据并报告它发生的时间。

这是我的数据框:

dput(head(monthly_cpu,24))

structure(list(name = c("Daily-Peaks", "Daily-Peaks", "Daily-Peaks", 
"Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", 
"Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", 
"Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", 
"Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", "Daily-Peaks", 
"Daily-Peaks"), date = structure(c(1315828800, 1315832400, 1315836000, 
1315839600, 1315843200, 1315846800, 1315850400, 1315854000, 1315857600, 
1315861200, 1315915200, 1315918800, 1315922400, 1315926000, 1315929600, 
1315933200, 1315936800, 1315940400, 1315944000, 1315947600, 1316001600, 
1316005200, 1316008800, 1316012400), class = c("POSIXct", "POSIXt"
), tzone = ""), cpu = c(5.6, 7.68, 8.64, 10.4, 11.36, 12, 12.16, 
12.8, 13.28, 13.92, 7.2, 7.84, 9.28, 10.72, 11.04, 11.04, 10.56, 
11.36, 10.72, 10.88, 1.76, 5.76, 9.6, 10.88), day = structure(c(15229, 
15229, 15229, 15229, 15229, 15229, 15229, 15229, 15229, 15229, 
15230, 15230, 15230, 15230, 15230, 15230, 15230, 15230, 15230, 
15230, 15231, 15231, 15231, 15231), class = "Date"), max = c(13.92, 
13.92, 13.92, 13.92, 13.92, 13.92, 13.92, 13.92, 13.92, 13.92, 
11.36, 11.36, 11.36, 11.36, 11.36, 11.36, 11.36, 11.36, 11.36, 
11.36, 12.48, 12.48, 12.48, 12.48)), .Names = c("name", "date", 
"cpu", "day", "max"), row.names = c(NA, 24L), class = "data.frame")

我创建另一个名为day的字段,并使用data.table包获取每天的最大值,如下所示:

monthly_cpu$day<-as.Date(monthly_cpu$date)
monthly_cpu<-data.table(monthly_cpu)
monthly_cpu<-monthly_cpu[,max:=max(cpu), by=day]

此时我需要选择日期(as.POSIXct格式)和每天的最大值。

我需要最终的monthly_cpu df框架如下:

Date   Max
2013-04-09 08:00:00 67.00
2013-04-10 13:00:00 50.00
2013-04-11 09:00:00 88.00
2013-04-12 12:00:00 100.00
2013-04-13 15:00:00 10.00

有没有办法从monthly_cpu数据框中选择日期和最大值以及如何?

2 个答案:

答案 0 :(得分:2)

听起来不像你最后一步中的作业,你想要这样做:

monthly_cpu[, max(cpu), by=day]

答案 1 :(得分:0)

我确信有一种方法可以做到这一点,但我认为这对我有用:

monthly_cpu<-subset(monthly_cpu, cpu == max)