Apache-Flink 1.11无法在SQL函数DDL中使用Python UDF

时间:2020-07-09 19:39:11

标签: python apache-flink user-defined-functions pyflink flink-table-api

根据此融合页面:

if (page is MasterDetailPage mdp) { App.Current.MainPage = mdp; // We need to initialize both Master's BindingContext as // well as Detail's BindingContext if they are PageModelBases if (mdp.Master.BindingContext is PageModelBase masterPM) { await masterPM.InitializeAsync(null); } if (mdp.Detail.BindingContext is PageModelBase detailPM) { await detailPM.InitializeAsync(null); } } else if (page is TabbedPage tabbedPage) // .... existing code here

Flink 1.11中提供了python udf,可用于SQL函数。

我在这里查看了flink文档:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL

,然后在终端上尝试使用以下参数启动 sql-client.sh

https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html

然后:

$ sql-client.sh embedded --pyExecutable /Users/jonathanfigueroa/opt/anaconda3/bin/python --pyFiles /Users/jonathanfigueroa/Desktop/pyflink/inference/test1.py

当我尝试时:

> Create Temporary System Function func1 as 'test1.func1' Language PYTHON;
[INFO] Function has been created. 

我尝试在每个单独的组合> Select func1(str) From (VALUES ("Name1", "Name2", "Name3")); [ERROR] Could not execute SQL statement. Reason: java.lang.IllegalStateException: Instantiating python function 'test1.func1' failed. 中使用:-pyarch,--pyArchives, -pyexec,--pyExecutable, -pyfs,--pyFiles,并且总是得到相同的结果。

顺便说一下,我的python文件看起来像这样:

.zip, .py

有什么我想念的吗?

问候,

乔纳森

1 个答案:

答案 0 :(得分:0)

Python UDF应该由pyflink.table.udf中的“ udf”修饰符包装,如下所示:

from pyflink.table.types import DataTypes
from pyflink.table.udf import udf

@udf(input_types=[DataTypes.INT()], result_type=DataTypes.INT())
def add_one(a):
    return a + 1

并且在启动sql-client时需要加载flink-python jar,如下所示:

$ cd $FLINK_HOME/bin
$ ./start-cluster.sh
$ ./sql-client.sh embedded -pyfs xxx.py -j ../opt/flink-python_2.11-1.11.0.jar

此外,您需要向taskmanager.memory.task.off-heap.size: 79mb中添加$FLINK_HOME/conf/flink-conf.yaml或其他可用于设置配置的文件(例如sql客户端环境文件),否则在执行python udf时会出现错误:

[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.TableException: The configured Task Off-Heap Memory 0 bytes is less than the least required Python worker Memory 79 mb. The Task Off-Heap Memory can be configured using the configuration key'taskmanager.memory .task.off-heap.size'.

最佳,