TypeError:不可用类型:'numpy.ndarray'rdd

时间:2016-04-19 18:40:53

标签: python sql numpy pyspark

我正在执行以下操作并获取错误TypeError:unhashable type:'numpy.ndarray'。 sombody可以找到原因吗?

第一次迭代后,

[array([-4.53909801,  5.42021141]),
 array([ 5.08111889,  4.93915172]),
 array([ 4.98807971, -5.00388445]),
 array([-4.92899716, -4.70698057])]

成为new_centers,然后发生错误。

new_centers = [Row(x=-5.659833908081055, y=7.705344200134277), Row(x=3.17942214012146, y=-9.446121215820312), Row(x=9.128270149230957, y=4.5666022300720215), Row(x=-6.432034969329834, y=-4.432190895080566)]

while old_centers is None or not has_converged(old_centers, new_centers, epsilon) and iteration < max_iterations:
    # update centers
    old_centers = new_centers


    center_pt_1 = points.rdd.map(lambda point: ( old_centers[nearest_center(old_centers, point)[0]], (point, 1) ) )

    center_sum_num =center_pt_1.reduceByKey(lambda a, b: ((a[0][0] + b[0][0], a[0][1] + b[0][1]) ,a[1] + b[1]))




    new_centers = center_sum_num.map(lambda tup: np.asarray(tup[1][0])/tup[1][1]).collect()      

    iteration += 1


     53 
---> 54         new_centers = center_sum_num.map(lambda tup:    np.asarray(tup[1][0])/tup[1][1]).collect()
  

TypeError:不可用类型:'numpy.ndarray'

0 个答案:

没有答案