Question

我试图在CoordinateMatrix上执行此行...

test = test.entries.map(lambda (i, j, v): (j, (i, v)))

其中Scala中的等效项似乎有效，但在pyspark中失败。线路执行时我得到的错误......

'MatrixEntry' object is not iterable

确认我正在使用CoordinateMatrix ......

>>> test = test_coord.entries
>>> test.first()
>>> MatrixEntry(0, 0, 7.0)

任何人都知道可能会发生什么？

Answer 1

假设test是 CoordinatedMatrix ，那么：

test.entries.map(lambda e: (e.j, (e.i, e.value)))

_{附注：你不能在lambda函数中解包一个元组。所以map(lambda (x, y, z): )在这种情况下不起作用，即使它似乎不是失败的原因。}

实施例：

test = CoordinateMatrix(sc.parallelize([(1,2,3), (4,5,6)]))
test.entries.collect()
# [MatrixEntry(1, 2, 3.0), MatrixEntry(4, 5, 6.0)]
test.entries.map(lambda e: (e.j, (e.i, e.value))).collect()
# [(2L, (1L, 3.0)), (5L, (4L, 6.0))]

处理CoordinateMatrix时，MatrixEntry不可迭代... pyspark MLlib

1 个答案: