字数:'Column'对象不可调用

时间:2016-09-07 14:35:27

标签: python apache-spark pyspark

from pyspark.sql.functions import split, explode

sheshakespeareDF = sqlContext.read.text(fileName).select(removePunctuation(col('value')))

shakespeareDF.show(15, truncate=False)

数据框如下所示:

enter image description here

ss = split(shakespeareDF.sentence," ")
shakeWordsDFa =explode(ss)

shakeWordsDF_S=sqlContext.createDataFrame(shakeWordsDFa,'word')

知道我做错了什么吗?提示说Column is not iterable

我该怎么办?我只想将shakeWordsDFa更改为数据框并重命名。

1 个答案:

答案 0 :(得分:3)

只需使用select:

shakespeareDF = sc.parallelize([
    ("from fairest creatures we desire increase", ),
    ("that thereby beautys rose might never die", ),
]).toDF(["sentence"])

(shakespeareDF
    .select(explode(split("sentence", " ")).alias("word"))
    .show(4))

## +---------+
## |     word|
## +---------+
## |     from|
## |  fairest|
## |creatures|
## |       we|
## +---------+
## only showing top 4 rows

Spark SQL列不是数据结构。没有绑定数据,只有在特定DataFrame的上下文中进行评估时才有意义。这种方式Columns更像是函数。