您好我正在处理以下数据框(nows
= 62208):
> head(workfile)
V1 V5 V7 V8 V9
4309 2014-03-01 13:30:00 1582.899 D.1Elec-0001 D.1 Elec-0001
6801 2014-03-01 13:45:00 1582.900 D.1Elec-0001 D.1 Elec-0001
6805 2014-03-01 14:00:00 1582.919 D.1Elec-0001 D.1 Elec-0001
5710 2014-03-01 14:15:00 1582.939 D.1Elec-0001 D.1 Elec-0001
5714 2014-03-01 14:30:00 1582.944 D.1Elec-0001 D.1 Elec-0001
6814 2014-03-01 14:45:00 1582.945 D.1Elec-0001 D.1 Elec-0001
我想计算列(V5
)中的每个元素与插入同一列(V5
)但前一行中的前一个元素之间的差异。在V7
列中,我有72个不同的级别(在我的情况下是72个不同的房间)。
如果我使用此代码:
pippo<-ddply(workfile, .(V7), transform, diff = c(tail(V5,-1)-head(V5,-1)), NA)
它出现以下错误信息:
Error in data.frame(list(V1 = c(1393680600, 1393681500, 1393682400, 1393683300,:
arguments imply differing number of rows: 864, 863, 1
如果我使用此代码:
pippo<-ddply(workfile, .(V7), transform, diff = c(tail(workfile$V5,-1)-head(workfile$V5,-1)), NA)
它会出现另一条错误信息:
Error in data.frame(list(V1 = c(1393680600, 1393681500, 1393682400, 1393683300,: arguments imply differing number of rows: 864, 62207, 1
我不能dput
我的数据框,因为它非常大。
有任何建议吗?
答案 0 :(得分:2)
如果你想要的只是简单的差异,这应该可以正常工作(如果你愿意,可以用0
代替NA
):
pippo <- ddply(df, .(V7), transform, diff = c(0,diff(V5)))
您还应该检查dplyr
,对于大数据框架应该更快:
library(dplyr)
pippo<- df%.%group_by(V7)%.%mutate(diff=c(NA, diff(V5)))
答案 1 :(得分:0)
这可能是一个简单的解决方案:
workfile$diff <- c(NA,diff(workfile$V5))