如何按组计算不同ID之间的距离?

时间:2016-03-15 23:53:41

标签: r dplyr distance

我的(示例)数据结构如下......其中,在不同条件下记录的参与者的X和Y坐标是随时间收集的:

    Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Harry", "Harry", "Harry", "Harry","Harry", "Paul", "Paul", "Paul", "Paul", "Paul"),
                          Time = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.01, 0.02, 0.03, 0.04, 0.05, 0.01, 0.02, 0.03, 0.04, 0.05),
                          Condition = c("Expr", "Expr", "Expr", "Expr", "Expr", "Con", "Con", "Con", "Con", "Con", "Nor", "Nor", "Nor", "Nor", "Nor"),
                          X = c(26.07, 26.06, 26.05, 26.09, 26.04, 26.65, 26.64, 26.62, 26.63, 26.62, 27.99, 28.01, 28.01, 28.02, 28.02),
                          Y = c(-5.01, -5.12, -5.14, -5.18, -5.2065, -12.37, 12.36, -12.35, -12.34, 12.33, -5.52, -5.514, -5.51, -5.50, -5.4962))

从同一位置捕获X和Y坐标。我可以使用以下方法计算每个参与者所涵盖的距离:

require(plyr)
require(dplyr)
DistanceOutput <- Individ %>%
     arrange(Participant, Time, Condition) %>%
     group_by(Participant, Condition) %>%
     mutate( lagX = lag(X, order_by=Time), lagY = lag(Y, order_by=Time)) %>%
     rowwise() %>%
     mutate(Distance = dist( matrix( c(X,Y,lagX,lagY),nrow=2,byrow=TRUE) )) %>%
     select(-lagX, -lagY)

但是,如何根据Participant计算每个TimeCondition之间的距离。例如,比尔和哈利之间的距离,比尔和保罗加上哈利和保罗在时间上的距离?

我的数据集是179,800个。理想情况下,快速解决方案是首选。谢谢!

1 个答案:

答案 0 :(得分:1)

这是一种计算每个时间点每个参与者之间距离的方法。我怀疑这是最有效的方式,但也许其他人会提出更优雅的解决方案。

你说你想计算每个Condition的参与者之间的距离。在您的示例数据中,每种情况下只有一个参与者。但是,除了Condition之外,下面的解决方案可以轻松扩展为Time

library(reshape2)
library(dplyr)

# Calculate distance matrix for each Time
res = lapply(unique(Individ$Time), function(i) {

  mat = as.matrix(Individ[Individ$Time==i, c("X","Y")])
  rownames(mat) = Individ$Participant[Individ$Time==i]

  # Distance matrix
  d = as.matrix(dist(mat))

  # Keep only lower triangle
  d[upper.tri(d, diag=TRUE)] = NA

  # Return data frame with distances, time and participants
  data.frame(Time=i, d) %>% add_rownames("P1")
})

# Combine all time points into single long data frame of distances
res = bind_rows(res) %>% 
  melt(id.var=c("Time","P1"), variable.name="P2", value.name="Distance") %>%
  filter(!is.na(Distance)) %>% 
  rowwise %>%
  mutate(Pair = paste(sort(c(as.character(P1), as.character(P2))), collapse="-")) %>% 
  select(Pair, Time, Distance) %>%
  arrange(Pair, Time)

res
         Pair  Time  Distance
1  Bill-Harry  0.01  7.382818
2  Bill-Harry  0.02 17.489620
3  Bill-Harry  0.03  7.232496
4  Bill-Harry  0.04  7.180334
5  Bill-Harry  0.05 17.546089
6   Bill-Paul  0.01  1.986580
7   Bill-Paul  0.02  1.989406
8   Bill-Paul  0.03  1.994618
9   Bill-Paul  0.04  1.956349
10  Bill-Paul  0.05  2.001081
11 Harry-Paul  0.01  6.979835
12 Harry-Paul  0.02 17.926427
13 Harry-Paul  0.03  6.979807
14 Harry-Paul  0.04  6.979807
15 Harry-Paul  0.05 17.881091