Python版本:3.6.5
火花:2.3.0
测试一个udf,它接受类型为<class 'pyspark.sql.column.Column'>
的输入Column<b' with bytecodes ...
,并且在打印时它具有udf_call = udf(udf_funct, StringType())
col_columns = [col(c) for c in df.columns]
print(col_columns)
#has list with bytecodes Column<b' which is not in case of 2.7
udf_call(struct(*col_columns))
def q = "What are your %age?"
def percent = 91
println formatMessage("Question: ${q} \n Answer: %d", percent)
@groovy.transform.CompileStatic
String formatMessage(CharSequence message, Object... messageParams) {
if(message instanceof GString){
//if message is a GString then build a new one with replaced values
message=new org.codehaus.groovy.runtime.GStringImpl(
message.getValues().collect{ it instanceof CharSequence ? it.replaceAll('%','%%') : it } as Object[],
message.getStrings()
)
}
return String.format(message as String, messageParams)
}
错误
TypeError:无法腌制_thread.RLock对象 在处理上述异常期间,发生了另一个异常: 文件“ ////SPARK2-2.3.0.-1../lib/spark2/python/pyspark/cloudpickle.py” , 918行,转储 cp.dump(obj) 文件“ ////SPARK2-2.3.0./lib/spark2/python/pyspark/cloudpickle.py”,行 249,在转储中 提高pickle.PicklingError(msg) _pickle.PicklingError:无法序列化对象:TypeError:无法腌制_thread.RLock对象
答案 0 :(得分:0)
解决方案: 创建一个pkg的egg文件,并通过--py-files发送。