我有一个包含id,POSIXct(日期和时间)
的数据框> myData
Tpt_ID Tpt_DateTime Value
1 1 2013-01-01 15:17:21 CST 10
2 2 2013-01-01 15:18:32 CST 5
3 3 2013-01-01 16:00:02 CST 1
4 1 2013-01-02 15:10:11 CST 15
5 2 2013-02-02 11:18:32 CST 6
6 3 2013-02-03 12:00:02 CST 2
7 1 2013-01-01 19:17:21 CST 21
8 2 2013-02-02 20:18:32 CST 8
9 3 2013-02-03 22:00:02 CST 3
我想获得每个日期和ID的最后一个值
例如,
Tpt_ID Tpt_DateTime Value
2 2013-01-01 15:18:32 CST 5
3 2013-01-01 16:00:02 CST 1
1 2013-01-02 15:10:11 CST 15
1 2013-01-01 19:17:21 CST 21
2 2013-02-02 20:18:32 CST 8
3 2013-02-03 22:00:02 CST 3
数据样本:
structure(list(Tpt_ID = c(1, 2, 3, 1, 2, 3, 1, 2, 3), Tpt_DateTime = structure(c(1357024641, 1357024712, 1357027202, 1357110611, 1359775112, 1359864002, 1357039041, 1359807512, 1359900002), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10, 5, 1, 15, 6, 2, 21, 8, 3)), .Names = c("Tpt_ID", "Tpt_DateTime", "Value"), row.names = c(NA, 9L), class = "data.frame")
答案 0 :(得分:4)
使用data.table
语法...
# Load package
require( data.table )
# Turn 'data.frame' into 'data.table'
dt <- data.table( df )
# Make dates from date/time
dt[ , Date:= as.Date( Tpt_DateTime ) ]
# Get last row of each group
dt[ , .SD[.N] , by = c("Tpt_ID" , "Date") ]
# Tpt_ID Date Tpt_DateTime Value
#1: 1 2013-01-01 2013-01-01 11:17:21 21
#2: 2 2013-01-01 2013-01-01 07:18:32 5
#3: 3 2013-01-01 2013-01-01 08:00:02 1
#4: 1 2013-01-02 2013-01-02 07:10:11 15
#5: 2 2013-02-02 2013-02-02 12:18:32 8
#6: 3 2013-02-03 2013-02-03 14:00:02 3
首先,我们将您的data-time
数据转换为Date :=
as.Date( Tpt_DateTime )
然后我们使用.SD
为每个组获取X的数据的子集。 .N
包含每个组的行数,因此.SD[.N]
为我们提供了每个组的最后一行。
最后,by=c("Tpt_ID" , "Date")
定义了这些群组。
答案 1 :(得分:0)
另一种data.table解决方案,其工作方式类似于@ SimonO101的答案 -
library(data.table)
dt <- data.table( dt )
dt[ , Date:= as.Date( Tpt_DateTime ) ]
#indexing the data.table
setkey(dt,Tpt_ID,Date)
#mult = "last" returns te last row of each unique group as defined by the by argument
dt[dt[unique(dt), mult="last", which=TRUE]]
返回 -
> dt[dt[unique(dt),,mult="last", which=TRUE]]
Tpt_ID Tpt_DateTime Value Date
1: 1 2013-01-01 16:47:21 21 2013-01-01
2: 1 2013-01-02 12:40:11 15 2013-01-02
3: 2 2013-01-01 12:48:32 5 2013-01-01
4: 2 2013-02-02 17:48:32 8 2013-02-02
5: 3 2013-01-01 13:30:02 1 2013-01-01
6: 3 2013-02-03 19:30:02 3 2013-02-03
答案 2 :(得分:0)
没有data.table ......
x <- structure(list(Tpt_ID = c(1, 2, 3, 1, 2, 3, 1, 2, 3), Tpt_DateTime = structure(c(1357024641, 1357024712, 1357027202, 1357110611, 1359775112, 1359864002, 1357039041, 1359807512, 1359900002), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10, 5, 1, 15, 6, 2, 21, 8, 3)), .Names = c("Tpt_ID", "Tpt_DateTime", "Value"), row.names = c(NA, 9L), class = "data.frame")
x$ID_Date <- paste(x$Tpt_ID,as.Date(x$Tpt_DateTime),sep="_")
f <- function(id, mydf){
tempdf <- mydf[mydf$ID_Date==id,]
return(tempdf[which.max(tempdf$Tpt_DateTime),])
}
res <- as.data.frame(t(sapply(unique(x$ID_Date), f, mydf=x)))
rownames(res) <- NULL
res