Question

所以在这里我觉得我可以识别出两组数据。在目视识别集群后，对这些数据进行子集的最有效方法是什么？在这个数据中，在马力= 49时可以方便地中断，但我知道并非所有数据都是干净的。

Answer 1

您可以使用kmeans或hclust对数据进行群集。然后提取集群ID，可视化结果并将它们与您自己的假设进行比较。我将使用mtcars数据来演示

# For reproducibility
set.seed(42)

# Perform kmeans clustering, 3 groups
kclusters <- kmeans(mtcars[,c(1,4)], 3)

# Bind together the original data and the clusterID
plot_data <- cbind(mtcars, kclusters$cluster)

# Plot the results and check your own assumptions.
ggplot(plot_data, aes(x = hp, y = mpg)) +
   geom_point(aes(color = factor(kclusters$cluster)))

在可视化集群后，对数据进行子集的最简单方法是什么？

1 个答案: