使用pyspark将Dstream转换为数据帧

时间:2017-10-19 16:10:04

标签: pyspark spark-dataframe dstream

如何将DStream转换为数据帧? 这是我的实际代码

localhost = "127.0.0.1"
addresses = [(localhost, 9999)]
schema = ['event', 'id', 'time','occurence']
flumeStream = FlumeUtils.createPollingStream(ssc, addresses)
counts = flumeStream.map(lambda line: str(line).split(",")) \
        .filter(lambda line: len(line)>1) \
        .map(lambda line: (line[29],line[30],line[67],1)) \
        .foreachRDD(lambda rdd: sqlContext.createDataFrame(rdd))

counts.show()

ssc.start()
ssc.awaitTerminationOrTimeout(62)
ssc.stop()

它给了我以下错误:

AttributeError: 'NoneType' object has no attribute 'show'

1 个答案:

答案 0 :(得分:0)

将您的DStream转换为RDD,然后转换为DataFrame,即dstrea.rdd.to_df