Float Argument必须是Pyspark中的字符串或参数

时间:2016-03-22 08:06:58

标签: apache-spark typeerror pyspark k-means rdd

我想在PySpark上应用k表示聚类。 但是我得到类型错误:float()参数必须是字符串或数字。有没有人可以马上帮助我?

lines = lines.map(lambda line: line.split(" "))
new = lines.map(lambda x: (str(x[2]), str(x[3]), str(x[4]), str(x[5]), str(x[6])))
new.take(4)

Sample input (new):

[('-13', '7', '-0.573824415813', '0', '1'),
 ('-20', '13', '-0.728721307165', '0', '1'),
 ('-27', '14', '-1.18661648046', '0', '1'),
 ('-29', '10', '-0.757241996939', '0', '1')]    

k = 10 # cluster size for k-means
kmeans_iteration = 40000
estimator = KMeans(init='k-means++', n_clusters=k, n_init=10)
estimator.fit(new)

0 个答案:

没有答案