当SOG
为0时Stops
获得1.当下一个SOG
为0时,Stops
会得到2,并且会连续这样。
现在我想将SOG != 0
的所有行分组。诀窍是我应该有以下字段:
MinTime
:BS
和TS
的最小值,即船到港的日期和时间。
MaxTime
:BS
和TS
Duration
:MinTime
和MaxTime
AveLat
和AveLong
:平均纬度和经度。可能是最具挑战性的。请参阅消息底部的公式。
这就是我所拥有的:
MMSI BS TS LAT LONG SOG SIZE_A Stops
247117300 6.4.2014 15:56:07 57.71432 11.96005 0 46 1
247117300 6.4.2014 16:05:07 57.71433 11.96005 0 46 1
247117300 6.4.2014 16:11:07 57.71432 11.96005 0 46 1
247117300 6.4.2014 16:20:06 57.71433 11.96005 0 46 1
247117300 6.4.2014 16:29:06 57.71433 11.96003 0 46 1
247117300 6.4.2014 16:29:27 57.71433 11.96003 4 46 0
247117300 6.4.2014 16:34:28 57.71433 11.96003 4 46 0
247117300 6.4.2014 16:37:29 57.71433 11.96003 4 46 0
247117300 6.4.2014 17:14:40 57.71433 11.96003 4 46 0
247117300 6.4.2014 17:18:30 57.71432 11.96003 4 46 0
247117300 6.4.2014 17:22:50 57.71433 11.96002 4 46 0
247117300 6.4.2014 17:27:01 57.71432 11.96002 4 46 0
247117300 6.4.2014 17:29:09 57.71435 11.96003 0 46 2
247117300 6.4.2014 17:33:50 57.71435 11.96003 0 46 2
247117300 6.4.2014 17:39:49 57.71437 11.96003 0 46 2
247117300 6.4.2014 17:42:51 57.71435 11.96003 0 46 2
247117300 6.4.2014 17:51:49 57.71433 11.96003 0 46 2
247117300 6.4.2014 17:52:37 57.71432 11.96002 0 46 2
247117300 6.4.2014 17:58:26 57.71212 11.95697 3 46 0
247117300 6.4.2014 18:00:26 57.71047 11.95567 4 46 0
这是期望的结果(AveLAT和AveLONG是假的):
MMSI BS TS LAT LONG SOG SIZE_A Stops MinTime MaxTime Duration_min AveLAT AveLON
247117300 6.4.2014 15:56:07 57.71432 11.96005 0 46 1 6.4.2014 15:56:07 6.4.2014 16:29:06 34 57.71432 11.96005
247117300 6.4.2014 17:29:09 57.71435 11.96003 0 46 2 6.4.2014 17:29:09 6.4.2014 17:52:37 23 57.71433 11.96003
获得平均LAT和LONG的公式:
答案 0 :(得分:2)
以下是使用dplyr
和tidyr
的一个想法。我使用上面的计算信息编写了一个自定义函数。 foo
是您的数据。平均纬度的最后一位数与您的预期结果不完全匹配。这可能是由于四舍五入。
library(dplyr)
library(tidyr)
test <- filter(foo, Stops != 0) %>% # drop rows with Stop == 0
unite(dates, BS, TS, sep = " ") %>% #create date object
mutate(dates = as.POSIXct(strptime(dates, format = "%d.%m.%Y %H:%M:%S"))) %>%
group_by(Stops) %>% # for each stop
filter(dates == min(dates) | dates == max(dates)) %>% #select rows with min and max dates
summarise(minTime = min(dates),
maxTime = max(dates),
duration = max(dates) - min(dates),
size_A = SIZE_A[1])
# Stops minTime maxTime duration size_A
#1 1 2014-04-06 15:56:07 2014-04-06 16:29:06 32.98333 mins 46
#2 2 2014-04-06 17:29:09 2014-04-06 17:52:37 23.46667 mins 46
### A custom function
cal <- function(x, y){
latToRadians <- x * pi / 180
longToRadians <- y * pi / 180
x_cartesian <- cos(latToRadians) * cos(longToRadians)
y_cartesian <- cos(latToRadians) * sin(longToRadians)
z_cartesian <- sin(latToRadians)
aveX <- sum(x_cartesian) / length(x_cartesian)
aveY <- sum(y_cartesian) / length(y_cartesian)
aveZ <- sum(z_cartesian) / length(z_cartesian)
hyp <- sqrt(aveX * aveX + aveY * aveY)
lat <- atan2(aveZ, hyp)
long <- atan2(aveY, aveX)
latMean <- lat * 180 / pi
longMean <- long * 180 / pi
return(as.data.frame(cbind(latMean, longMean)))
}
#### Get average long/lat using the function above
test2 <- foo %>%
filter(Stops != 0) %>%
group_by(Stops) %>%
do(cal(.$LAT, .$LONG))
# Stops latMean longMean
#1 1 57.71433 11.96005
#2 2 57.71434 11.96003
### Combine test and test2
inner_join(test, test2)
#Joining by: "Stops"
#Source: local data frame [2 x 7]
# Stops minTime maxTime duration size_A latMean longMean
#1 1 2014-04-06 15:56:07 2014-04-06 16:29:06 32.98333 mins 46 57.71433 11.96005
#2 2 2014-04-06 17:29:09 2014-04-06 17:52:37 23.46667 mins 46 57.71434 11.96003