如何在pyspark SQL查询中使用unbase64函数?

时间:2016-01-18 06:53:05

标签: python apache-spark pyspark

我似乎无法弄清楚为什么unbase64函数在我的Spark SQL查询中不起作用。

这是一个例子。我试图解码" VGhpcyBpcyBhIHRlc3Qh"通过调用spark SQL中的unbase64函数。关于为什么输出没有被解码的任何想法?感谢。

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import unbase64

sc = SparkContext("local", "Simple App")

sqlContext = SQLContext(sc)

log = [{"eventTime":"2015-12-14 15:27:00","id":"9ab0135f-b8a3-4312-9065-9f8874fd790c","fullLog":"VGhpcyBpcyBhIHRlc3Qh"}]

df = sqlContext.createDataFrame(log)

df.registerTempTable('data')

query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')

query.write.save("output", format="json")

我想要的输出是:{"test":"VGhpcyBpcyBhIHRlc3Qh"}{"test":"This is a test!"}

1 个答案:

答案 0 :(得分:0)

这似乎对我有用......

from pyspark.sql import HiveContext
from pyspark.sql import SQLContext

log = [("2015-12-14 15:27:00","9ab0135f-b8a3-4312-9065-9f8874fd790c","VGhpcyBpcyBhIHRlc3Qh")]

rdd_log = sc.parallelize(log)

df = sqlContext.createDataFrame(rdd_log, ["eventTime", "id", "fullLog"])

df.registerTempTable("data")

query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')

query = query.select(query.test.cast("string").alias('test'))

print query.collect()

>> [Row(test=u'This is a test!')]