保存作为类的属性的spark Dataframe时出错

时间:2019-01-10 05:54:13

标签: python apache-spark pyspark py4j

我有一个将spark数据框作为属性的类。我为该类编写了一个将Dataframe保存为csv文件的方法。

import pickle
import pyspark
import inspect

def write_file(save_object, filepath):
    save_object.write.csv(filepath + "/csv_name.csv")


class BasicTypes:
    def __init__(self):
        self.name = "bts"
        self.spark_df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
    def save_df(self, filepath):
        attributes = inspect.getmembers(self, lambda a: not (inspect.isroutine(a)))
        attributes =  [a[0] for a in attributes if not(a[0].startswith('__') and a[0].endswith('__'))]
        for each_attribute in attributes:
            if isinstance(getattr(self, each_attribute), pyspark.sql.dataframe.DataFrame):
                x = getattr(self, each_attribute)
                x.show(4)
                write_file(x, filepath + '/' + self.name + '/')

save_df方法中,我收集了attributes列表中的所有属性名称,然后对其进行迭代以检查它们是否是spark数据帧。如果是Spark数据帧,则调用write_file方法,该方法是包装函数,用于将Spark数据帧另存为csv文件。如果运行失败,将显示failed for ...

我尝试使用save_df方法保存数据框

bt = BasicTypes()
file_str = "path/to/location"
save_df(file_str)

问题在于,有时代码运行没有问题,但有时却出现以下错误

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 33, in save
  File "/Users/timetravellingcocoon/Downloads/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/Users/timetravellingcocoon/Downloads/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Users/timetravellingcocoon/Downloads/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o270.__getstate__. Trace:
py4j.Py4JException: Method __getstate__([]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

什么可能导致此问题?

0 个答案:

没有答案