具有日期和地理坐标的聚合表

时间:2014-11-24 10:59:46

标签: r coordinates aggregate

SOG为0时Stops获得1.当下一个SOG为0时,Stops会得到2,并且会连续这样。

现在我想将SOG != 0的所有行分组。诀窍是我应该有以下字段:

MinTimeBSTS的最小值,即船到港的日期和时间。

MaxTimeBSTS

的最大值

DurationMinTimeMaxTime

之间的差异

AveLatAveLong:平均纬度和经度。可能是最具挑战性的。请参阅消息底部的公式。

这就是我所拥有的:

      MMSI       BS       TS      LAT     LONG SOG SIZE_A Stops
 247117300 6.4.2014 15:56:07 57.71432 11.96005   0     46     1
 247117300 6.4.2014 16:05:07 57.71433 11.96005   0     46     1
 247117300 6.4.2014 16:11:07 57.71432 11.96005   0     46     1
 247117300 6.4.2014 16:20:06 57.71433 11.96005   0     46     1
 247117300 6.4.2014 16:29:06 57.71433 11.96003   0     46     1
 247117300 6.4.2014 16:29:27 57.71433 11.96003   4     46     0
 247117300 6.4.2014 16:34:28 57.71433 11.96003   4     46     0
 247117300 6.4.2014 16:37:29 57.71433 11.96003   4     46     0
 247117300 6.4.2014 17:14:40 57.71433 11.96003   4     46     0
 247117300 6.4.2014 17:18:30 57.71432 11.96003   4     46     0
 247117300 6.4.2014 17:22:50 57.71433 11.96002   4     46     0
 247117300 6.4.2014 17:27:01 57.71432 11.96002   4     46     0
 247117300 6.4.2014 17:29:09 57.71435 11.96003   0     46     2
 247117300 6.4.2014 17:33:50 57.71435 11.96003   0     46     2
 247117300 6.4.2014 17:39:49 57.71437 11.96003   0     46     2
 247117300 6.4.2014 17:42:51 57.71435 11.96003   0     46     2
 247117300 6.4.2014 17:51:49 57.71433 11.96003   0     46     2
 247117300 6.4.2014 17:52:37 57.71432 11.96002   0     46     2
 247117300 6.4.2014 17:58:26 57.71212 11.95697   3     46     0
 247117300 6.4.2014 18:00:26 57.71047 11.95567   4     46     0

这是期望的结果(AveLAT和AveLONG是假的):

     MMSI       BS       TS      LAT     LONG SOG SIZE_A Stops           MinTime           MaxTime Duration_min   AveLAT   AveLON
247117300 6.4.2014 15:56:07 57.71432 11.96005   0     46     1 6.4.2014 15:56:07 6.4.2014 16:29:06           34 57.71432 11.96005
247117300 6.4.2014 17:29:09 57.71435 11.96003   0     46     2 6.4.2014 17:29:09 6.4.2014 17:52:37           23 57.71433 11.96003

获得平均LAT和LONG的公式:

  1. LatToRadians:value * Pi / 180
  2. LongToRadians:value * Pi / 180
  3. X_cartesian:COS(LatToRadians)* COS(LongToRadians)
  4. Y_Cartesian:COS(LatToRadians)* SIN(LongToRadians)
  5. Z_Cartesian:SIN(LatToRadians)
  6. AveX:SUM(X_Cartesian)/所有出现的X_Cartesian
  7. AveY:SUM(Y_Cartesian)/所有出现的Y_Cartesian
  8. AveZ:SUM(Z_Cartesian)/所有出现的Z_Cartesian
  9. LAT:反正切(HYP,AveZ)
  10. LONG:反正切(AveX,AveY)
  11. HYP:平方根(AveX * AveX + AveY * AveY) 12:LATMean:LAT * 180 / PI
  12. LONGMean:LONG * 180 / PI

1 个答案:

答案 0 :(得分:2)

以下是使用dplyrtidyr的一个想法。我使用上面的计算信息编写了一个自定义函数。 foo是您的数据。平均纬度的最后一位数与您的预期结果不完全匹配。这可能是由于四舍五入。

library(dplyr)
library(tidyr)

test <- filter(foo, Stops != 0) %>% # drop rows with Stop == 0 
        unite(dates, BS, TS, sep = " ") %>% #create date object
        mutate(dates = as.POSIXct(strptime(dates, format = "%d.%m.%Y %H:%M:%S"))) %>%
        group_by(Stops) %>% # for each stop
        filter(dates == min(dates) | dates == max(dates)) %>% #select rows with min and max dates
        summarise(minTime = min(dates),
                  maxTime = max(dates),
                  duration = max(dates) - min(dates),
                  size_A = SIZE_A[1])

#  Stops             minTime             maxTime      duration size_A
#1     1 2014-04-06 15:56:07 2014-04-06 16:29:06 32.98333 mins     46
#2     2 2014-04-06 17:29:09 2014-04-06 17:52:37 23.46667 mins     46


### A custom function

cal <- function(x, y){            
            latToRadians <- x * pi / 180
            longToRadians <- y * pi / 180

            x_cartesian <- cos(latToRadians) * cos(longToRadians)
            y_cartesian <- cos(latToRadians) * sin(longToRadians)
            z_cartesian <- sin(latToRadians)

            aveX <- sum(x_cartesian) / length(x_cartesian)
            aveY <- sum(y_cartesian) / length(y_cartesian)
            aveZ <- sum(z_cartesian) / length(z_cartesian)

            hyp <- sqrt(aveX * aveX + aveY * aveY)
            lat <- atan2(aveZ, hyp)
            long <- atan2(aveY, aveX)

            latMean <- lat * 180 / pi
            longMean <- long * 180 / pi

            return(as.data.frame(cbind(latMean, longMean)))
        }

#### Get average long/lat using the function above

test2 <- foo %>%
         filter(Stops != 0) %>%
         group_by(Stops) %>%
         do(cal(.$LAT, .$LONG))

#  Stops  latMean longMean
#1     1 57.71433 11.96005
#2     2 57.71434 11.96003

### Combine test and test2
inner_join(test, test2) 

#Joining by: "Stops"
#Source: local data frame [2 x 7]

#  Stops             minTime             maxTime      duration size_A  latMean longMean
#1     1 2014-04-06 15:56:07 2014-04-06 16:29:06 32.98333 mins     46 57.71433 11.96005
#2     2 2014-04-06 17:29:09 2014-04-06 17:52:37 23.46667 mins     46 57.71434 11.96003