pyspark:数据框写入镶木地板

时间:2017-02-17 15:37:24

标签: python pyspark pyspark-sql

在运行pyspark脚本加载镶木桌时出现以下错误。在通过pyspark shell进行测试时,我没有遇到任何问题

交互模式正常工作:

 df_writer = pyspark.sql.DataFrameWriter(df)
 df_writer.saveAsTable('test', format='parquet', mode='overwrite',path='xyz/test_table.parquet')

脚本模式会抛出错误:

/opt/mapr/spark/spark-2.0.1//bin/spark-submit --jars /opt/mapr/spark/spark-2.0.1/-2.0.1/jars/commons-csv-1.2.jar /home/mapr/scripts/pyspark_load.py
17/02/17 14:57:06 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
  File "/home/mapr/scripts/2_pyspark_load.py", line 23, in <module>
    df_writer = pyspark.sql.DataFrameWriter(df)
NameError: name 'pyspark' is not defined

2 个答案:

答案 0 :(得分:2)

您还可以更轻松地保存数据框:

df.write.parquet("xyz/test_table.parquet", mode='overwrite')
# 'df' is your PySpark dataframe

答案 1 :(得分:0)

我的脚本的interactive和spark_submit之间的区别是我必须导入pyspark。例如,

import pyspark

df_writer = pyspark.sql.DataFrameWriter(df)
# Rest of Code