from pyspark.sql.functions import split, explode
sheshakespeareDF = sqlContext.read.text(fileName).select(removePunctuation(col('value')))
shakespeareDF.show(15, truncate=False)
数据框如下所示:
ss = split(shakespeareDF.sentence," ")
shakeWordsDFa =explode(ss)
shakeWordsDF_S=sqlContext.createDataFrame(shakeWordsDFa,'word')
知道我做错了什么吗?提示说Column is not iterable
。
我该怎么办?我只想将shakeWordsDFa
更改为数据框并重命名。
答案 0 :(得分:3)
只需使用select:
shakespeareDF = sc.parallelize([
("from fairest creatures we desire increase", ),
("that thereby beautys rose might never die", ),
]).toDF(["sentence"])
(shakespeareDF
.select(explode(split("sentence", " ")).alias("word"))
.show(4))
## +---------+
## | word|
## +---------+
## | from|
## | fairest|
## |creatures|
## | we|
## +---------+
## only showing top 4 rows
Spark SQL列不是数据结构。没有绑定数据,只有在特定DataFrame
的上下文中进行评估时才有意义。这种方式Columns
更像是函数。