Question

我正在使用spark 1.5.0

我有一个如下所示的数据框，我正在尝试从这里读取一列

sentence 2. Sentence 2

我想从句子中读出所有单词（词汇）。我怎么读这个

编辑1：错误仍然存在

我现在在spark 2.0.0中运行此命令并收到此错误

>>> words = tokenizer.transform(sentenceData)
>>> words
DataFrame[label: bigint, sentence: string, words: array<string>]
>>> words['words']
Column<words>

编辑的分辨率 - 1 - Link

Answer 1

你可以：

from pyspark.sql.functions import explode

words.select(explode('words')).rdd.flatMap(lambda x: x)

读取pyspark中Column <column-name>的内容

1 个答案: