Question

我是R和数据分析的初学者。我有一个大约2500行，7列的数据集。我想用15个中心对数据集进行聚类，但仅基于前两列（保留其他列）完整的聚集数据集。

我还需要显示基于第三列排序的聚类数据集。

有人可以帮助我使用所需的语法吗？让我的csv文件名是locdata.csv 前两列是＆＃34; lat＆＃34;和＆＃34; lon＆＃34; 第三栏是＆＃34; date＆＃34;。

Answer 1

这可以帮助你实现目标。

首先创建数据集（或者，import the csv file）：

set.seed(1)
df <- data.frame(matrix(rnorm(n=10000, mean=10, sd=20), ncol=8))
names(df)[1:3] <- c("lat", "lon", "date")
# Use df <- read.csv(..) instead to load from a file

require(dplyr)
cluster.df <- select(df, lat, lon) # Select the columns to cluster on
km <- kmeans(cluster.df, 15)

接下来，您可以使用kmeans保留原始订单的事实来提取集群：

# Extract the clusters and add them to original data frame
df$cluster = km$cluster

# Sort on whatever column you prefer
df %>%
  arrange(date, cluster)

kmeans基于所有变量中的固定数量的变量进行聚类

1 个答案: