我有这样的数据框
vehicleId visitDate taskName
123 1/1/2013 Change Battery
456 1/1/2013 Wiper Blades Changed
123 1/2/2013 Tire Pressure Check
123 1/3/2013 Tire Rotation
456 3/1/2013 Tire Pressure Check
我想做的是
vehicleId visitDate timeBetweenVisits(hrs)
123 1/1/2013 24
123 1/2/2013 672
456 1/1/2013 48
有什么想法我怎么能用R做到这一点?
答案 0 :(得分:1)
加载并转换数据:
## data now comma-separated as you have fields containing whitespace
R> res <- read.csv(text="
vehicleId, visitDate, taskName
123, 1/1/2013, Change Battery
456, 1/1/2013, Wiper Blades Changed
123, 1/2/2013, Tire Pressure Check
123, 1/3/2013, Tire Rotation
456, 3/1/2013, Tire Pressure Check", stringsAsFactors=FALSE)
R> res$visitDate <- as.Date(res$visitDate, "%m/%d/%Y") ## now in Daye format
看看它:
R> res
vehicleId visitDate taskName
1 123 2013-01-01 Change Battery
2 456 2013-01-01 Wiper Blades Changed
3 123 2013-01-02 Tire Pressure Check
4 123 2013-01-03 Tire Rotation
5 456 2013-03-01 Tire Pressure Check
R>
日期计算:
R> res[3,"visitDate"] - res[1,"visitDate"]
Time difference of 1 days
R> as.numeric(res[3,"visitDate"] - res[1,"visitDate"])
[1] 1
R> difftime(res[3,"visitDate"],res[1,"visitDate"], unit="hours")
Time difference of 24 hours
R> as.numeric(difftime(res[3,"visitDate"],res[1,"visitDate"], unit="hours"))
[1] 24
R>
矢量化:
R> as.numeric(difftime(res[2:nrow(res),"visitDate"],
+ res[1:(nrow(res)-1),"visitDate"], unit="hours"))
[1] 0 24 24 1368
R>
您当然也可以分配给新列。您可能还希望按车辆ID进行分组。
答案 1 :(得分:1)
在@ Dirk的回答中使用res
,这是一个by
表达式来完成这项工作:
by(res, res$vehicleId, FUN=function(d)
{
data.frame(vehicleId=head(d$vehicleId, -1),
visitDate=head(d$visitDate, -1),
tbv=diff(d$visitDate))
}
)
## res$vehicleId: 123
## vehicleId visitDate tbv
## 1 123 2013-01-01 1 days
## 2 123 2013-01-02 1 days
## ----------------------------------------------------------------------------------------------
## res$vehicleId: 456
## vehicleId visitDate tbv
## 1 456 2013-01-01 59 days