如果我的RDD是这样的:
(key (date, value), (date, value), (date, value))
如何将其转换为
(key (Numpy.Array(date), Numpy.Array(value)))
答案 0 :(得分:1)
您可以使用zip
重塑(date, value)
对:
>>> xs = (("x1", 1), ("x2", 2), ("x3", 3))
>>> zip(*xs)
[('x1', 'x2', 'x3'), (1, 2, 3)]
添加地图或理解解决了(Numpy.Array(date), Numpy.Array(value))
部分,其余部分非常简单:
import numpy as np
import datetime
rdd = sc.parallelize([
("foo",
(datetime.date(2010, 01, 01), 1.0),
(datetime.date(2011, 02, 10), 2.0),
(datetime.date(2012, 03, 10), 3.0)
),
("bar",
(datetime.date(2000, 04, 01), 14.0),
(datetime.date(2001, 05, 10), 15.0),
(datetime.date(2002, 06, 10), 16.0 )
),
])
rdd.map(lambda x: (x[0], tuple(np.array(_) for _ in zip(*x[1:]))))