我正在尝试根据欧几里得距离将点分配到分组中。例如,在下面的数据中,三个点代表三个不同的组(One, Two, Three
,图中的非绿色点)。我想根据最小欧几里得距离(即将Scatter
更改为Scatter
One
的最接近距离,将剩余点(Two
绿点)分配到分组中或Three
分。
我试图在kmeans
或其他聚类函数之外执行此操作,并且仅使用最小欧几里得距离,但欢迎并欣赏建议。
set.seed(123)
Data <- data.frame(
x = c(c(3,5,8), runif(20, 1, 10)),
y = c(c(3,5,8), runif(20, 1, 10)),
Group = c(c("One", "Two", "Three"), rep("Scatter", 20))
)
ggplot(Data, aes(x, y, color = Group)) +
geom_point(size = 3) +
theme_bw()
答案 0 :(得分:2)
那这样的事情呢?
bind_cols(
Data,
dist(Data %>% select(-Group)) %>% # Get x/y coordinates from Data
as.matrix() %>% # Convert to full matrix
as.data.frame() %>% # Convert to data.frame
select(1:3) %>% # We're only interested in dist to 1,2,3
rowid_to_column("pt") %>%
gather(k, v, -pt) %>%
group_by(pt) %>%
summarise(k = k[which.min(v)])) %>% # Select label with min dist
mutate(Group = factor(Group, levels = unique(Data$Group))) %>%
ggplot(aes(x, y, colour = k, shape = Group)) +
geom_point(size = 3)
说明:我们使用dist
,One
,Two
和所有Three
点之间的Scatter
计算所有成对的欧几里得距离。然后,我们根据每个Scatter
点的标签k
与One
(k = 1
),Two
(k = 2
),{ {1}}(Three
)。
请注意,确实将(9.6,3.1)处的k = 3
点正确地“分类”为属于Scatter
(Two
);您可以通过在k = 2
绘图链中添加coord_fixed()
来确认。