我有一个类,该类具有一些可以序列化的属性和一些不能序列化的属性。我知道可序列化和不可序列化对象的类型。
import numpy as np
import pandas as pd
from pyspark.ml.classification import RandomForestClassifier, RandomForestClassificationModel
import pickle
import pyspark
serializable_types = [int, float, str, list, dict, tuple, np.ndarray, pd.core.frame.DataFrame]
pyspark_ml_types = [pyspark.sql.dataframe.DataFrame, pyspark.ml.util.MLWriteable]
class BasicTypes:
def __init__(self):
self.name = "basic_types_2"
self.pd_df = pd.DataFrame()
self.np_arr = np.array([1,2,3])
self._dictionary = {}
self._list = [1,2,3]
self._set = (1,2,3)
self._str = "basicsave"
self.model = RandomForestClassifier()
self.fit_model = RandomForestClassificationModel()
def save(self, filepath):
with open(filepath + '/' + self.name+'.pickle', 'wb') as pickle_file:
pickle.dump(self, pickle_file)
火花数据帧不可序列化,因此需要存储为csv或Parquet文件。
因此,有一种方法可以使我们仅保存类的那些属性,而忽略非可序列化的属性。