应用错误收集

我想在PySpark上应用k表示聚类。但是我得到类型错误：float（）参数必须是字符串或数字。有没有人可以马上帮助我？

lines = lines.map(lambda line: line.split(" "))
new = lines.map(lambda x: (str(x[2]), str(x[3]), str(x[4]), str(x[5]), str(x[6])))
new.take(4)

Sample input (new):

[('-13', '7', '-0.573824415813', '0', '1'),
 ('-20', '13', '-0.728721307165', '0', '1'),
 ('-27', '14', '-1.18661648046', '0', '1'),
 ('-29', '10', '-0.757241996939', '0', '1')]    

k = 10 # cluster size for k-means
kmeans_iteration = 40000
estimator = KMeans(init='k-means++', n_clusters=k, n_init=10)
estimator.fit(new)

Float Argument必须是Pyspark中的字符串或参数

0 个答案: