我有一张桌子,其中包含来自不同仪表的大量测量值。每个测量值都存储在一个新行中,并具有实际的仪表值。我需要在每米连续测量之间有所不同。
Simplified imput:
[2016-11-03,MeterA,45]
[2016-11-03,MeterB,45]
[2016-11-04,MeterA,47]
[2016-11-04,MeterB,54]
目前我正在使用几个for循环进行此操作,但这需要很长时间,并且可能有一种更有效的方法。代码目前
data$diff <- 0;
for(address in unique(data$Address)){
subaddr <- subset(data, data$Address== address)
for(meter in unique(subaddr$Meter)){
submeter <- subset(subaddr, subaddr$Meter == meter)
for (i in 1:nrow(submeter)){
if(i > 1){
prow = submeter[i-1,]
row = submeter[i,]
data[which(data$Address == address & data$Meter == meter & data$UCPTlogTime == row$UCPTlogTime),]$diff <- row$UCPTvalue - prow$UCPTvalue
}
}
}
}
期望的输出
[2016-11-03,MeterA,0]
[2016-11-03,MeterB,0]
[2016-11-04,MeterA,2]
[2016-11-04,MeterB,9]
答案 0 :(得分:2)
以下是使用data.table
的一种方法:
library(data.table)
dt <- data.table(df)
dt[,delta := c(0, diff(value)), by = "group"][]
# date group value delta
# 1: 2016-11-04 A 24 0
# 2: 2016-11-04 B 24 0
# 3: 2016-11-05 A 30 6
# 4: 2016-11-05 B 31 7
# 5: 2016-11-06 A 36 6
# 6: 2016-11-06 B 38 7
# 7: 2016-11-07 A 44 8
# 8: 2016-11-07 B 46 8
# 9: 2016-11-08 A 51 7
# 10: 2016-11-08 B 54 8
# 11: 2016-11-09 A 57 6
# 12: 2016-11-09 B 56 2
# 13: 2016-11-10 A 61 4
# 14: 2016-11-10 B 61 5
# 15: 2016-11-11 A 68 7
# 16: 2016-11-11 B 69 8
# 17: 2016-11-12 A 72 4
# 18: 2016-11-12 B 73 4
# 19: 2016-11-13 A 81 9
# 20: 2016-11-13 B 82 9
df <- data.frame(
date = rep(Sys.Date() + 1:10, each = 2),
group = rep(c("A", "B"), 10),
value = rpois(2, 20) + cumsum(rpois(20, 3)),
stringsAsFactors = FALSE
)
答案 1 :(得分:2)
dplyr使用lag
函数轻而易举。假设数据框中的列名为UCPTlogTime
,Address
,Meter
和UCPTvalue
:
library(dplyr)
data <- data %>% group_by(Address, Meter) %>%
mutate(delta = order_by(UCPTlogTime, UCPTvalue - lag(UCPTvalue))) %>%
mutate(delta = ifelse(is.na(delta), 0, delta))
答案 2 :(得分:1)
这似乎更简单,其中diff是你想要计算的。
for (i in 1:nrow(t)){t$diff[i]<-t[i,3]-t[1,3]}
t
v1 v2 v3 diff
1 Date1 MeterA 45 0
2 Date2 MeterB 45 0
3 Date3 MeterC 47 2
4 Date4 MeterD 54 9
答案 3 :(得分:1)
以下是使用dplyr
的另一种方法 - 没有看到Address
的变量,但您可以将其添加到group_by()
library(dplyr)
df <- data.frame(read_date = c("2016-11-03",
"2016-11-03",
"2016-11-04",
"2016-11-04"),
Meter = c("MeterA",
"MeterB",
"MeterA",
"MeterB"),
UCPTvalue = c(45,
45,
47,
54))
out <- df %>%
group_by(Meter) %>%
mutate(diff = ifelse(row_number() == 1,
0,
UCPTvalue - lag(UCPTvalue, 1)))