将PySpark数据框保存到实木复合地板文件中

时间:2019-01-13 12:20:25

标签: python pandas pyspark save parquet

尝试保存PySpark数据框时出现异常。

这是我的代码,上面有一个玩具示例:

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
import pyspark
import pandas as pd

toy_df = '{"userId":{"0":1,"1":1,"10":1,"100":3,"1000":15,"10000":71,"10001":71,"10002":71,"10003":71,"10004":71},"movieId":{"0":31,"1":1029,"10":1371,"100":296,"1000":157,"10000":581,"10001":589,"10002":908,"10003":1171,"10004":1259},"rating":{"0":2.5,"1":3.0,"10":2.5,"100":4.5,"1000":2.0,"10000":4.0,"10001":3.0,"10002":5.0,"10003":5.0,"10004":4.0},"timestamp":{"0":1260748800000,"1":1260748800000,"10":1260748800000,"100":1298851200000,"1000":1052870400000,"10000":974592000000,"10001":974592000000,"10002":974592000000,"10003":974592000000,"10004":974592000000}}'
toy_df = pd.read_json(toy_df)

# Make the pandas dataframe a pyspark dataframe
toy = spark.createDataFrame(toy_df)

# Write the pyspark dataframe to disk
toy.write.save('toy', format='parquet', mode='append')

错误:

  

Py4JJavaError:调用o152.save时发生错误。

0 个答案:

没有答案