pyspark-python3 _pickle.PicklingError:无法序列化对象:TypeError:无法腌制_thread.RLock对象

时间:2020-01-10 17:31:50

标签: python-3.x apache-spark pyspark

  • Python版本:3.6.5

  • 火花:2.3.0

测试一个udf,它接受类型为<class 'pyspark.sql.column.Column'>的输入Column<b' with bytecodes ...,并且在打印时它具有udf_call = udf(udf_funct, StringType()) col_columns = [col(c) for c in df.columns] print(col_columns) #has list with bytecodes Column<b' which is not in case of 2.7 udf_call(struct(*col_columns))

def q = "What are your %age?" 
def percent = 91
println formatMessage("Question: ${q} \n Answer: %d", percent)

@groovy.transform.CompileStatic
String formatMessage(CharSequence message, Object... messageParams) {
    if(message instanceof GString){
        //if message is a GString then build a new one with replaced values
        message=new org.codehaus.groovy.runtime.GStringImpl(
                message.getValues().collect{ it instanceof CharSequence ? it.replaceAll('%','%%') : it } as Object[], 
                message.getStrings()
            )

    }
    return String.format(message as String, messageParams)
}

错误

TypeError:无法腌制_thread.RLock对象 在处理上述异常期间,发生了另一个异常: 文件“ ////SPARK2-2.3.0.-1../lib/spark2/python/pyspark/cloudpickle.py” , 918行,转储 cp.dump(obj) 文件“ ////SPARK2-2.3.0./lib/spark2/python/pyspark/cloudpickle.py”,行 249,在转储中 提高pickle.PicklingError(msg) _pickle.PicklingError:无法序列化对象:TypeError:无法腌制_thread.RLock对象

1 个答案:

答案 0 :(得分:0)

  1. 我的项目有子包,然后有子包 包 subpckg1 subpkg2 .py
  2. 从Main.py im调用UDF,该UDF将调用subpkg2(.py)文件中的函数 3。由于具有更多的嵌套功能,并且UDF与许多其他功能相互通信,因此火花作业如何找不到subpkg2文件

解决方案: 创建一个pkg的egg文件,并通过--py-files发送。