Question

正如标题所述：我想计算相邻时间点之间的距离，并找到所有时间点的n最短路径。

我在下面发布了一个例子。在此示例中，有2个明确区域（在3D空间中），其中点被定位。在每个区域内，我们有多个时间点。我想在强制执行时间点排序时计算T1 --> T2 --> ... --> T8之间的距离。我最终将此视为某种树，我们最初从T1的第一个点分支到T2的2个（或更多）点，然后从每个T2分支到每个T3，等等。一旦构建了树，我们就可以计算出来了从开始到结束的每条路径的距离，返回距离最小的顶部n路径。简而言之，这里的目标是将每个T1节点与其各自的最短路径连接起来。也许可能有更高效或更好的方法来做到这一点。

示例数据：

> example
                   Timepoint Centre.int.X Centre.int.Y Centre.int.Z
FOV4.Beads.T1.C2          T1        5.102       28.529        0.789
FOV4.Beads.T1.C2.1        T1       37.904       50.845        0.837
FOV4.Beads.T2.C2          T2       37.905       50.843        1.022
FOV4.Beads.T2.C2.1        T2        5.083       28.491        0.972
FOV4.Beads.T4.C2          T4       37.925       50.851        0.858
FOV4.Beads.T4.C2.1        T4        5.074       28.479        0.785
FOV4.Beads.T5.C2          T5       37.908       50.847        0.977
FOV4.Beads.T5.C2.1        T5        5.102       28.475        0.942
FOV4.Beads.T6.C2          T6        5.114       28.515        0.643
FOV4.Beads.T6.C2.1        T6       37.927       50.869        0.653
FOV4.Beads.T7.C2          T7       37.930       50.875        0.614
FOV4.Beads.T7.C2.1        T7        5.132       28.525        0.579
FOV4.Beads.T8.C2          T8        4.933       28.674        0.800
FOV4.Beads.T8.C2.1        T8       37.918       50.816        0.800

此data.frame生成一个如下所示的3D散点图：

生成上图的基线代码发布如下：

require(scatterplot3d)
    with(example, {
      s3d <- scatterplot3d(Centre.int.X, Centre.int.Y, Centre.int.Z,
                           pch=19,
                           cex.symbols=2,
                           col.axis="grey", col.grid="lightblue",
                           angle=45, 
                           xlab="X",
                           ylab="Y",
                           zlab="Z")
    })

这是一个相对干净的例子，但我的一些数据非常混乱，这就是为什么我试图避免聚类方法（例如k-means，dbscan等）。任何帮助，将不胜感激！

编辑：添加结构细节。

structure(list(Timepoint = structure(c(1L, 1L, 2L, 2L, 4L, 4L, 
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), .Label = c("T1", "T2", "T3", 
"T4", "T5", "T6", "T7", "T8"), class = "factor"), Centre.int.X = c(5.102, 
37.904, 37.905, 5.083, 37.925, 5.074, 37.908, 5.102, 5.114, 37.927, 
37.93, 5.132, 4.933, 37.918), Centre.int.Y = c(28.529, 50.845, 
50.843, 28.491, 50.851, 28.479, 50.847, 28.475, 28.515, 50.869, 
50.875, 28.525, 28.674, 50.816), Centre.int.Z = c(0.789, 0.837, 
1.022, 0.972, 0.858, 0.785, 0.977, 0.942, 0.643, 0.653, 0.614, 
0.579, 0.8, 0.8)), .Names = c("Timepoint", "Centre.int.X", "Centre.int.Y", 
"Centre.int.Z"), class = "data.frame", row.names = c("FOV4.Beads.T1.C2", 
"FOV4.Beads.T1.C2.1", "FOV4.Beads.T2.C2", "FOV4.Beads.T2.C2.1", 
"FOV4.Beads.T4.C2", "FOV4.Beads.T4.C2.1", "FOV4.Beads.T5.C2", 
"FOV4.Beads.T5.C2.1", "FOV4.Beads.T6.C2", "FOV4.Beads.T6.C2.1", 
"FOV4.Beads.T7.C2", "FOV4.Beads.T7.C2.1", "FOV4.Beads.T8.C2", 
"FOV4.Beads.T8.C2.1"))

Answer 1

不是很优雅，但它可以找到最短的路径。

distance.matrix <- as.matrix(dist(example[,2:4], upper = TRUE, diag = TRUE))

t1s <- grep("T1", rownames(distance.matrix))
paths <- lapply(t1s, function (t) { 
    path <- rownames(distance.matrix)[t]
    distance <- NULL
    for (i in c(2,4:8))
    {
        next.nodes <- grep(paste0("T", i), rownames(distance.matrix))
        next.t <- names(which.min(distance.matrix[t,next.nodes]))
        path <- c(path, next.t)
        distance <- sum(distance, distance.matrix[t,next.t])
        t <- next.t

    }
    output <- list(path, distance)
    names(output) <- c("Path", "Total Distance")
    return(output)
})

编辑：切断一些不需要的行。

Answer 2

这是Python中的一个实现：

from io import StringIO
import numpy as np
import pandas as pd

# Read data
s = """Name                      Timepoint Centre.int.X Centre.int.Y Centre.int.Z
FOV4.Beads.T1.C2          T1        5.102       28.529        0.789
FOV4.Beads.T1.C2.1        T1       37.904       50.845        0.837
FOV4.Beads.T2.C2          T2       37.905       50.843        1.022
FOV4.Beads.T2.C2.1        T2        5.083       28.491        0.972
FOV4.Beads.T4.C2          T4       37.925       50.851        0.858
FOV4.Beads.T4.C2.1        T4        5.074       28.479        0.785
FOV4.Beads.T5.C2          T5       37.908       50.847        0.977
FOV4.Beads.T5.C2.1        T5        5.102       28.475        0.942
FOV4.Beads.T6.C2          T6        5.114       28.515        0.643
FOV4.Beads.T6.C2.1        T6       37.927       50.869        0.653
FOV4.Beads.T7.C2          T7       37.930       50.875        0.614
FOV4.Beads.T7.C2.1        T7        5.132       28.525        0.579
FOV4.Beads.T8.C2          T8        4.933       28.674        0.800
FOV4.Beads.T8.C2.1        T8       37.918       50.816        0.800"""

df = pd.read_table(StringIO(s), sep=" ", skipinitialspace=True, index_col=0, header=0)

# Get time point ids
ts = sorted(df.Timepoint.unique())
# Get the spatial points in each time point
points = [df[df.Timepoint == t].iloc[:, -3:].values.copy() for t in ts]
# Get the spatial point names in each time point
point_names = [list(df[df.Timepoint == t].index) for t in ts]

# Find the best next point starting from the end
best_nexts = []
accum_dists = [np.zeros(len(points[-1]))]
for t_prev, t_next in zip(reversed(points[:-1]), reversed(points[1:])):
    t_dists = np.linalg.norm(t_prev[:, np.newaxis, :] - t_next[np.newaxis, :, :], axis=-1)
    t_dists += accum_dists[-1][np.newaxis, :]
    t_best_nexts = np.argmin(t_dists, axis=1)
    t_accum_dists = t_dists[np.arange(len(t_dists)), t_best_nexts]
    best_nexts.append(t_best_nexts)
    accum_dists.append(t_accum_dists)
# Reverse back the best next points and accumulated distances
best_nexts = list(reversed(best_nexts))
accum_dists = list(reversed(accum_dists))

# Reconstruct the paths
paths = []
for i, p in enumerate(point_names[0]):
    cost = accum_dists[0][i]
    path = [p]
    idx = i
    for t_best_nexts, t_point_names in zip(best_nexts, point_names[1:]):
        next_idx = t_best_nexts[idx]
        path.append(t_point_names[next_idx])
        idx = next_idx
    paths.append((path, cost))

for i, (path, cost) in enumerate(paths):
    print("Path {} (total distance {}):".format(i, cost))
    print("\n".join("\t{}".format(p) for p in path))
    print()

输出：

Path 0 (total distance 1.23675871386137):
    FOV4.Beads.T1.C2
    FOV4.Beads.T2.C2.1
    FOV4.Beads.T4.C2.1
    FOV4.Beads.T5.C2.1
    FOV4.Beads.T6.C2
    FOV4.Beads.T7.C2.1
    FOV4.Beads.T8.C2

Path 1 (total distance 1.031072818390815):
    FOV4.Beads.T1.C2.1
    FOV4.Beads.T2.C2
    FOV4.Beads.T4.C2
    FOV4.Beads.T5.C2
    FOV4.Beads.T6.C2.1
    FOV4.Beads.T7.C2
    FOV4.Beads.T8.C2.1

说明：

它与Viterbi algorithm基本相同。从最后开始，将每个最终节点的初始成本分配给零。然后，对于每对连续时间点t_prev和t_next，您计算每个可能的点对之间的距离，并在t_next中添加先前累积的点数成本。然后为t_prev中的每个点选择成本最低的下一个点，并继续前一个时间点。最后，对于每个时间点的每个点，best_nexts都包含下一个时间点的最佳点。

重建只是在best_nexts中遵循这些指数的问题。对于每个可能的初始点，在下一个时间点选择最佳点并继续。

计算相邻时间点之间的距离，并通过所有时间点找到“n”个最短路径

2 个答案: