Question

我用函数创建了python文件（python_file.py），然后在同一范围内的pyspark-shell中将该函数称为UDF。守则如下

import python_file as outer
pyspark_func = udf(outer.my_funct, StringType())
df1 = df.select(pyspark_func(col('col1')))

结果

AttributeError：＆＃39; UserDefinedFunction＆＃39;对象没有属性＆＃39; _get_object_id＆＃39;错误。

任何人都可以解释我的错误。还有其他方法吗？

Answer 1

请你试试这个代码。而查询数据框我们必须遵循Spark Sql给出的格式

实际数据

move /Y C:\somepath\*.txt "\\devicename\Some folder\"

UDF创建和注册

+--------+---+----+
|    date| id|name|
+--------+---+----+
| 05FEB12|101|John|
| 19APR13|102|Mike|
|19APR17s|103|Anni|
+--------+---+----+

结果输出：

def userDefinedMethod(sample):
    return sample+"is my Name"

userDefinedMethod = udf(userDefinedMethod, StringType())

**dataDf.select(dataDf["id"],dataDf["name"],userDefinedMethod(dataDf["name"]).alias("Modified name")).show()**

Pyspark - AttributeError：＆＃39; UserDefinedFunction＆＃39;对象没有属性＆＃39; _get_object_id＆＃39;

1 个答案: