在下面的代码中,当尝试重新运行代码时,是否会像这样使用for循环?
for x in range(2, 11):
kmeans = KMeans().setK(x).setSeed(1)
model = kmeans.fit(dataset)
这是我下面的代码的其余部分,我确实觉得自己在这里感到困惑。
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator
dataset = spark.read.format("libsvm").load("/FileStore/tables/colon_cancer-ecfbf.txt")
for x in range(2, 11):
kmeans = KMeans().setK(x).setSeed(1)
model = kmeans.fit(dataset)
predictions = model.transform(dataset)
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
print(center)
答案 0 :(得分:0)
以下是我所做的缩进修复,它应该修复您的代码并使其能够运行,模拟x值2,3,4,5,6,7,8,9,10
。
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator
dataset = spark.read.format("libsvm").load("/FileStore/tables/colon_cancer-ecfbf.txt")
for x in range(2, 11):
kmeans = KMeans().setK(x).setSeed(1)
model = kmeans.fit(dataset)
predictions = model.transform(dataset)
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
print(center)
此缩进方案启用以下行:
kmeans = KMeans().setK(x).setSeed(1)
model = kmeans.fit(dataset)
predictions = model.transform(dataset)
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
print(center)
针对上面列表中的每个x值运行。