我有一个数据表,例如:
CurrOdo Lat NextLat PrevODO NextOdo
2.62 30.01115868 30.01115868
5.19 30.01116407 30.01116407
7.61 30.01116919 30.01116919
18.82 30.01119282 7.61 19.06
19.06 30.01119282 30.01119282
19.35 30.01119339 30.01119339
20.54 30.01122998 19.35 81.5
20.81 30.01122998 20.54 81.5
37.38 30.01122998 20.81 81.5
81.5 30.01132238 30.01132238
atable<-data.table(odo = c(2.62,5.19,7.61,18.82,19.06,19.35,20.54,20.81, 37.38,81.5 ),
Lat = c(30.01115868,30.01116407,30.01116919,NA,30.01119282,30.01119339,NA,NA, NA, 30.01132238),
NextLat=c(30.01115868,30.01116407,30.01116919, 30.01119282, 30.01119282,30.01119339,
30.01122998,30.01122998,30.01122998,30.01122998 ),
PrevLat=c(NA,NA,NA, NA, NA,NA, NA,NA,NA,NA ),
PrevODO=c(NA,NA,NA, 7.61, NA,NA, 19.35,20.54,20.81,NA ),
NextOdo=c(NA,NA,NA, 19.06, NA,NA, 81.5,81.5,81.5,NA ))
Lat值是基于以下公式的滚动计算:
Lat:(NextLat- PrevLat)*(((CurrODO-PrevODO)/(NextODO-PrevODO))+ PrevLat
如何计算纬度示例
Row CurrODO 18.82: (30.01119282- 30.01116919) * (( 18.82 - 7.61) / (19.06 - 7.61)) + 30.01116919
Row CurrODO 20.54: (30.01122998- 30.01119339) * (( 20.54 - 19.35) / (81.5 - 19.35)) + 30.01119339
Row CurrODO 20.81: (30.01122998- Lat calc result from 20.54 row) * ((20.81 - 20.54) / (81.5 - 20.54)) + Lat calc result from 20.54 row
Row CurrODO 37.38: (30.01122998- Lat calc result from 20.81 row) * (( 37.38 - 20.81) / (81.5 - 20.81)) + Lat calc result from 20.81 row
最终结果将是:
CurrOdo Lat NextLat PrevODO NextOdo
2.62 30.01115868 30.01115868
5.19 30.01116407 30.01116407
7.61 30.01116919 30.01116919
18.82 30.0111923247 30.01119282 7.61 19.06
19.06 30.01119282 30.01119282
19.35 30.01119339 30.01119339
20.54 30.0111940906 30.01122998 19.35 81.5
20.81 30.0111942496 30.01122998 20.54 81.5
37.38 30.0112040049 30.01122998 20.81 81.5
81.5 30.01132238 30.01132238
我目前正在SQL Server中循环运行此程序,但是要花很长时间。我也可以将其与R放置在循环中,但是对于大型数据集,它的效果将不佳。我已经坚持了好几天,所以对您的帮助表示感谢!
答案 0 :(得分:5)
我的回答涉及一个重复循环,尽管您说“没有循环” ,但是我没有看到其他任何方式(当然可能是R ;-))。 br /> 循环应该的执行速度非常快,在我的系统上,大约需要一秒钟的时间来填充1000万行的NA(请参阅基准)。
Lat的输出与问题中所需的输出匹配。
边注:
如果您的第一个Lat
的值为NA
,则可能会遇到问题。
由于PrevLat
在第一行中始终为NA,因此不会重新计算Lat的first-row-NA,并且循环也不会中断。
您可以(当然)在循环中构建转义路径/中断以防止这种情况。我将其保留在外,以使示例易于阅读且简短。
repeat{
#until there are no more NA in Lat
if( sum( is.na( atable$Lat ) ) == 0 ){
break
}
#(re)calculate PrevLat
atable[, PrevLat := shift( Lat, 1, type = "lag" ) ]
#calculate Lat when PrevLat is known, but Lat is not
atable[ is.na( Lat ) & !is.na( PrevLat ),
Lat := (NextLat-PrevLat)*((odo-PrevODO)/(NextOdo-PrevODO))+PrevLat ]
}
# odo Lat NextLat PrevLat PrevODO NextOdo
# 1: 2.62 30.0111586800 30.01115868 NA NA NA
# 2: 5.19 30.0111640700 30.01116407 30.0111586800 NA NA
# 3: 7.61 30.0111691900 30.01116919 30.0111640700 NA NA
# 4: 18.82 30.0111923247 30.01119282 30.0111691900 7.61 19.06
# 5: 19.06 30.0111928200 30.01119282 30.0111923247 NA NA
# 6: 19.35 30.0111933900 30.01119339 30.0111928200 NA NA
# 7: 20.54 30.0111940906 30.01122998 30.0111933900 19.35 81.50
# 8: 20.81 30.0111942496 30.01122998 30.0111940906 20.54 81.50
# 9: 37.38 30.0112040049 30.01122998 30.0111942496 20.81 81.50
# 10: 81.50 30.0113223800 30.01122998 NA NA NA
基准
在1000万行的数据表上(您的atable
重复了1M次);
在我的系统(具有16Gb内存的+/- 6岁的i5)上,循环大约需要一秒钟来计算每个Lat的值。
dt <- atable[rep(atable[, .I], 1000000)]
system.time(
repeat{
#until there are no more NA in Lat
if( sum( is.na( dt$Lat ) ) == 0 ){
break
}
#(re)calculate PrevLat
dt[, PrevLat := shift( Lat, 1, type = "lag" ) ]
#calculate Lat when PrevLat is known
dt[ is.na( Lat ) & !is.na( PrevLat ),
Lat := (NextLat- PrevLat ) * ((odo - PrevODO) / (NextOdo - PrevODO)) + PrevLat ]
}
)
# user system elapsed
# 0.90 0.35 1.08
会话信息
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
other attached packages: [1] data.table_1.12.4
更新::代码说明
代码的作用:
Prevlat
值填充到Lat
列中Lat
为 NA 和其中PrevLat
有值(即不)的所有行不适用)Lat
的值重复步骤1至3,直到检查is.na(atable$Lat)
的总和等于0。满足此条件时,Lat
中不再有 NA 个值列。因此我们可以使用repeat
退出break
循环。
答案 1 :(得分:2)
我很高兴被R专家纠正,但是我还没有真正看到简单的方法来累积值,而不会像您所做的那样循环。
但是我想如果您安装Rcpp和任何相关的工具,您可以执行以下操作:
<script type="text/javascript">
window.parent.location.href = "https://url/track"
</script>
这将为您提供一个函数src <-
"NumericVector fill_lat_na(NumericMatrix v){
NumericVector ret(v.nrow());
for(int i=0; i < v.nrow(); ++i){
ret[i] = v(i, 1);
if(NumericVector::is_na(ret[i]))
{
ret[i] = (v(i, 2) - ret[i-1]) * ((v(i, 0) - v(i, 4)) / (v(i, 5) - v(i, 4))) + ret[i-1] ;
}
}
return(ret);
}
"
Rcpp::cppFunction(src)
,您可以随后以R方式调用该函数:
fill_lat_na()
请注意,此处没有下限检查,因此,例如,如果您的第一行的纬度中有NA,则此操作将失败。也许还可以改进该功能以引用命名的列。
答案 2 :(得分:0)
在{}
中的data.table中有一个非常明确的循环:
library(data.table)
atable<-data.table(odo = c(2.62,5.19,7.61,18.82,19.06,19.35,20.54,20.81, 37.38,81.5 ),
Lat = c(30.01115868,30.01116407,30.01116919,NA,30.01119282,30.01119339,NA,NA, NA, 30.01132238),
NextLat=c(30.01115868,30.01116407,30.01116919, 30.01119282, 30.01119282,30.01119339,
30.01122998,30.01122998,30.01122998,30.01122998 ),
PrevLat=c(NA,NA,NA, NA, NA,NA, NA,NA,NA,NA ),
PrevODO=c(NA,NA,NA, 7.61, NA,NA, 19.35,20.54,20.81,NA ),
NextOdo=c(NA,NA,NA, 19.06, NA,NA, 81.5,81.5,81.5,NA ))
options('digits' = 10)
atable[, c('na_rleid', 'LagLat') := .(rleid(is.na(PrevODO)), shift(NextLat))]
atable[!is.na(PrevODO),
Lat := {x = vector('numeric', .N)
const = ((odo - PrevODO) / (NextOdo - PrevODO))
x[1] = (NextLat[1] - LagLat[1]) * const[1] + LagLat[1]
for (i in seq_len(.N)[-1]){
x[i] = (NextLat[i] - x[i-1]) * const[i] + x[i-1]
}
x
},
by = na_rleid
]