Question

我有一个火花Rdd，形式为Row（id，Words）单词包含单词列表。我想将此列表转换为单个列。输入

ID  Words
1   [w1,w2,w3]
2   [w3,w4]

我想将其转换为输出格式

ID  Word
1   w1
1   w2
1   w3
2   w3
2   w4

Answer 1

如果您想要% First start stopwatch time_since_last_movement = tic; while toc(time_since_last_movement) < 10 % Do your loopy things if variable_changed time_since_last_movement = tic; % Restart stopwatch end end工作，则需要使用rdd：

flatMap()

但是，如果您愿意使用DataFrames（recommended），则可以使用rdd.flatMap(lambda x: [(x['ID'], w) for w in x["Words"]]).collect() #[(1, u'w1'), (1, u'w2'), (1, u'w3'), (2, u'w3'), (2, u'w4')]：

pyspark.sql.functions.explode

或者更好的是，一起跳过import pyspark.sql.functions as f df = rdd.toDF() df.select('ID', f.explode("Words").alias("Word")).show() #+---+----+ #| ID|Word| #+---+----+ #| 1| w1| #| 1| w2| #| 1| w3| #| 2| w3| #| 2| w4| #+---+----+并直接创建一个DataFrame：

rdd

将spark Rdd列转换为Pyspark中的行

1 个答案: