我有一个将spark数据框作为属性的类。我为该类编写了一个将Dataframe保存为csv文件的方法。
import pickle
import pyspark
import inspect
def write_file(save_object, filepath):
save_object.write.csv(filepath + "/csv_name.csv")
class BasicTypes:
def __init__(self):
self.name = "bts"
self.spark_df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
def save_df(self, filepath):
attributes = inspect.getmembers(self, lambda a: not (inspect.isroutine(a)))
attributes = [a[0] for a in attributes if not(a[0].startswith('__') and a[0].endswith('__'))]
for each_attribute in attributes:
if isinstance(getattr(self, each_attribute), pyspark.sql.dataframe.DataFrame):
x = getattr(self, each_attribute)
x.show(4)
write_file(x, filepath + '/' + self.name + '/')
在save_df
方法中,我收集了attributes
列表中的所有属性名称,然后对其进行迭代以检查它们是否是spark数据帧。如果是Spark数据帧,则调用write_file
方法,该方法是包装函数,用于将Spark数据帧另存为csv文件。如果运行失败,将显示failed for ...
我尝试使用save_df
方法保存数据框
bt = BasicTypes()
file_str = "path/to/location"
save_df(file_str)
问题在于,有时代码运行没有问题,但有时却出现以下错误
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 33, in save
File "/Users/timetravellingcocoon/Downloads/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/Users/timetravellingcocoon/Downloads/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/Users/timetravellingcocoon/Downloads/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o270.__getstate__. Trace:
py4j.Py4JException: Method __getstate__([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
什么可能导致此问题?