为Kmeans

时间:2018-04-17 03:13:18

标签: python ubuntu apache-spark k-means

我正在K-Means中实施Spark,当我尝试使用spark-submit Kmeans.py运行我的脚本时,我不断收到错误unexpected character after line continuation。我使用反斜杠来继续行,但我不知道我是如何使用错误的。我将在下面发布我的算法。

import pyspark
from pyspark.context import SparkContext
from pyspark import SparkConf
from pyspark.sql import SparkSession, SQLContext, Row
from pyspark.sql.functions import *
from pyspark.ml.clustering import KMeans
import json
import os

conf = SparkConf()
sc = SparkContext(conf = conf)
sc.setLogLevel("ERROR")

spark = SparkSession \
        .builder \
        .appName("Phone Book - Country Look up") \
        .config("spark.some.config.option", "some-value") \
        .getOrCreate()

dataset = spark.read.format("libsvm") \
        .load("/home/jay/Assignment6/Input.txt")

dataset.show(200)
dataset.printSchema()

kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(dataset)

var1 = model.computeCost(dataset)
print("Within Set Sum of Squared Errors =" + str(var1))

centers = model.clusterCenters()
print("Cluster centers: ")
for center in centers:
        print(center)

2 个答案:

答案 0 :(得分:0)

我认为你有一个拼写错误:.builder应该是.builder()

答案 1 :(得分:0)

行继续符后面的意外字符几乎总是意味着你在\之后有一个空格,这当然很难看到。如果在第15行报告问题可能在第14行。您可以通过删除连续字符并将表达式括在括号中来消除此类问题:

spark =(SparkSession 
        .builder 
        .appName("Phone Book - Country Look up") 
        .config("spark.some.config.option", "some-value") 
        .getOrCreate())