我正在使用Euclidean Distance与一对数据集合作。 首先,我的数据。
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4'),
x_p = c(160,600,400,245),
y_p = c(7,23,56,12))
我的目标是为points
中的每个点找到与centers
中所有中心的最小距离,并将中心名称附加到{{1}数据集(显然是最小的数据集),并自动执行此过程。
所以我从基地开始:
points
我心里想知道它应该如何运作,但我无法管理如何让它自动化。
#Euclidean distance
sqrt(sum((x-y)^2))
,以及points
centers
centers
所以我设法手动完成,让所有步骤自动完成:
points
问题是我无法自动管理它。您是否有任何想法为# 1.
x = (points[1,2:3]) # select the first of points
y1 = (centers[1,1:2]) # select the first center
y2 = (centers[2,1:2]) # select the second center
y3 = (centers[3,1:2]) # select the third center
y4 = (centers[4,1:2]) # select the fourth center
# 2.
# then the distances
distances <- data.frame(distance = c(
sqrt(sum((x-y1)^2)),
sqrt(sum((x-y2)^2)),
sqrt(sum((x-y3)^2)),
sqrt(sum((x-y4)^2))),
center = centers$center
)
# 3.
# then I choose the row with the smallest distance
d <- distances[which(distances$distance==min(distances$distance)),]
# 4.
# last, I put the label near the point
cbind(points[1,],d)
# 5.
# then I restart for the second point
的每个点自动执行此过程?
此外,我是否重新发明轮子,即它是否存在我不知道的更快的程序(作为一种功能)?
答案 0 :(得分:2)
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4'),
x_p = c(160,600,400,245),
y_p = c(7,23,56,12))
library(tidyverse)
points %>%
mutate(c = list(centers)) %>%
unnest() %>% # create all posible combinations of points and centers as a dataframe
rowwise() %>% # for each combination
mutate(d = sqrt(sum((c(x_p,y_p)-c(x_ce,y_ce))^2))) %>% # calculate distance
ungroup() %>% # forget the grouping
group_by(point, x_p, y_p) %>% # for each point
summarise(closest_center = center[d == min(d)]) %>% # keep the closest center
ungroup() # forget the grouping
# # A tibble: 4 x 4
# point x_p y_p closest_center
# <fct> <dbl> <dbl> <fct>
# 1 p1 160 7 b
# 2 p2 600 23 d
# 3 p3 400 56 c
# 4 p4 245 12 a
答案 1 :(得分:1)
使用dplyr
包,您可以使用group_by
循环遍历每个点,mutate
以形成距离列表,将distance
设置为列表的最小值,并将center
设置为最小距离中心的名称。对于重复行或点名称的情况,我已经包含了两种备选方案。
library(dplyr)
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4', "p4"),
x_p = c(160,600,400,245, 245),
y_p = c(7,23,56,12, 12))
#
# If duplicate rows need to be removed
#
result1 <- points %>% group_by(point) %>% distinct() %>%
mutate(lst = with(centers, list(sqrt( (x_p-x_ce)^2 + (y_p-y_ce)^2 ) ) ),
distance=min(unlist(lst)),
center = centers$center[which.min(unlist(lst))]) %>%
select(-lst)
给出结果
# A tibble: 4 x 5
# Groups: point [4]
point x_p y_p distance center
<fct> <dbl> <dbl> <dbl> <fct>
1 p1 160 7 21.5 b
2 p2 600 23 100. d
3 p3 400 56 67.9 c
4 p4 245 12 56.1 a
和
#
# Alternative if point names are not unique
#
points <- data.frame(point = c('p1','p2','p3','p4', "p4"),
x_p = c(160,600,400,245, 550),
y_p = c(7,23,56,12, 25))
result2 <- points %>% rowwise() %>%
mutate( lst = with(centers, list(sqrt( (x_p-x_ce)^2 + (y_p-y_ce)^2 ) ) ),
distance=min(unlist(lst)),
center = centers$center[which.min(unlist(lst))]) %>%
ungroup() %>% select(-lst)
结果
# A tibble: 5 x 5
point x_p y_p distance center
<fct> <dbl> <dbl> <dbl> <fct>
1 p1 160 7 21.5 b
2 p2 600 23 100. d
3 p3 400 56 67.9 c
4 p4 245 12 56.1 a
5 p4 550 25 50.2 d