我有一个问题。如何在sampdata对象上执行以下操作?我想要的结果是sampres对象。
谢谢。
伪代码:
if sampdata[,flow[1] == flow[2], by = site_no]
sampdata[,flow[2]] = next unique flow value & take the corresponding gage value # if yes
no change # if no
开始数据:
library(data.table)
sampdata <- data.table(c(02446500,02446500,02446500,02467000,02467000,02467000,06818000,06818000,06818000,06818000,06893000,06893000,06893000,06893000,06934500,06934500,06934500,07010000,07010000,07010000,07289000,07289000,07289000),c(21,21,22,70,76,82,14700,14700,14700,14800,11000,11000,11000,11100,19400,19400,19500,32000,32000,32100,146000,146000,147000),c(4,4.01,4.02,73.05,73.06,73.07,1,1.01,1.02,1.03,1,1.01,1.02,1.03,-1.2,-1.19,-1.18,-9.02,-9.01,-9,-4.43,-4.42,-4.41))
setnames(sampdata,c("site_no", "flow", "gage"))
setkey(sampdata, site_no)
最终结果:
sampres <- data.table(c(02446500,02446500,02446500,02467000,02467000,02467000,06818000,06818000,06818000,06818000,06893000,06893000,06893000,06893000,06934500,06934500,06934500,07010000,07010000,07010000,07289000,07289000,07289000),c(21,22,22,70,76,82,14700,14800,14700,14800,11000,11100,11000,11100,19400,19500,19500,32000,32100,32100,146000,147000,147000),c(4,4.02,4.02,73.05,73.06,73.07,1,1.03,1.02,1.03,1,1.03,1.02,1.03,-1.2,-1.18,-1.18,-9.02,-9,-9,-4.43,-4.41,-4.41))
setnames(sampres,c("site_no", "newflow", "newgage"))
setkey(sampres, site_no)
为了澄清,以下是来自cbind(sampdata,sampres)
的初始数据和结果:
site_no flow gage site_no flow gage
1: 2446500 21 4.00 2446500 21 4.00
2: 2446500 21 4.01 2446500 22 4.02
3: 2446500 22 4.02 2446500 22 4.02
4: 2467000 70 73.05 2467000 70 73.05
5: 2467000 76 73.06 2467000 76 73.06
6: 2467000 82 73.07 2467000 82 73.07
7: 6818000 14700 1.00 6818000 14700 1.00
8: 6818000 14700 1.01 6818000 14800 1.03
9: 6818000 14700 1.02 6818000 14700 1.02
10: 6818000 14800 1.03 6818000 14800 1.03
11: 6893000 11000 1.00 6893000 11000 1.00
12: 6893000 11000 1.01 6893000 11100 1.03
13: 6893000 11000 1.02 6893000 11000 1.02
14: 6893000 11100 1.03 6893000 11100 1.03
15: 6934500 19400 -1.20 6934500 19400 -1.20
16: 6934500 19400 -1.19 6934500 19500 -1.18
17: 6934500 19500 -1.18 6934500 19500 -1.18
18: 7010000 32000 -9.02 7010000 32000 -9.02
19: 7010000 32000 -9.01 7010000 32100 -9.00
20: 7010000 32100 -9.00 7010000 32100 -9.00
21: 7289000 146000 -4.43 7289000 146000 -4.43
22: 7289000 146000 -4.42 7289000 147000 -4.41
23: 7289000 147000 -4.41 7289000 147000 -4.41
site_no flow gage site_no flow gage
感谢您的编辑。 我已将粗体添加到相同的流量/量具组合中。下一个唯一流量值和相应的量具值需要成为第二个newflow / newgage值。
这是一小组数据,我正在尝试找到自动解决方案,因为我手工完成了这些更改。我正在尝试确定一种自动更改方式,因为我有几千个需要使用的网站。谢谢。
答案 0 :(得分:1)
这个功能是在我意识到这只是在狭窄的站点组中之前设计的,但它仍然有效:
swapfun <- function(x){ samp <- rle(x);
unlist( sapply(
seq_along(samp$lengths),
function(x) {
t <- rep(samp$values[x], samp$lengths[x])
if(samp$lengths[x] >1) {t[2] <- samp$values[x+1]}
t}
)
)
}
sampdata[ , newflow:=swapfun(flow), by=site_no]
> sampdata
site_no flow gage newflow
1: 2446500 21 4.00 21
2: 2446500 21 4.01 22
3: 2446500 22 4.02 22
4: 2467000 70 73.05 70
5: 2467000 76 73.06 76
6: 2467000 82 73.07 82
7: 6818000 14700 1.00 14700
8: 6818000 14700 1.01 14800
9: 6818000 14700 1.02 14700
10: 6818000 14800 1.03 14800
11: 6893000 11000 1.00 11000
12: 6893000 11000 1.01 11100
13: 6893000 11000 1.02 11000
14: 6893000 11100 1.03 11100
15: 6934500 19400 -1.20 19400
16: 6934500 19400 -1.19 19500
17: 6934500 19500 -1.18 19500
18: 7010000 32000 -9.02 32000
19: 7010000 32000 -9.01 32100
20: 7010000 32100 -9.00 32100
21: 7289000 146000 -4.43 146000
22: 7289000 146000 -4.42 147000
23: 7289000 147000 -4.41 147000