Apache Flink 1.11 流式接收器到 S3

时间:2021-01-05 13:12:07

标签: apache-flink minio flink-sql pyflink flink-table-api

我正在使用 Flink FileSystem SQL 连接器从 Kafka 读取事件并写入 S3(使用 MinIo)。这是我的代码,

exec_env = StreamExecutionEnvironment.get_execution_environment()
exec_env.set_parallelism(1)
# start a checkpoint every 10 s
exec_env.enable_checkpointing(10000)
exec_env.set_state_backend(FsStateBackend("s3://test-bucket/checkpoints/"))
t_config = TableConfig()
t_env = StreamTableEnvironment.create(exec_env, t_config)

INPUT_TABLE = "source"
INPUT_TOPIC = "Rides"
LOCAL_KAFKA = 'kafka:9092'
OUTPUT_TABLE = "sink"

ddl_source = f"""
       CREATE TABLE {INPUT_TABLE} (
           `rideId` BIGINT,
           `isStart` BOOLEAN,
           `eventTime` STRING,
           `lon` FLOAT,
           `lat` FLOAT,
           `psgCnt` INTEGER,
           `taxiId` BIGINT
       ) WITH (
           'connector' = 'kafka',
           'topic' = '{INPUT_TOPIC}',
           'properties.bootstrap.servers' = '{LOCAL_KAFKA}',
           'format' = 'json'
       )
   """

ddl_sink = f"""
       CREATE TABLE {OUTPUT_TABLE} (
           `rideId` BIGINT,
           `isStart` BOOLEAN,
           `eventTime` STRING,
           `lon` FLOAT,
           `lat` FLOAT,
           `psgCnt` INTEGER,
           `taxiId` BIGINT
       ) WITH (
           'connector' = 'filesystem',
           'path' = 's3://test-bucket/kafka_output',
           'format' = 'parquet'
       )
   """

t_env.sql_update(ddl_source)
t_env.sql_update(ddl_sink)

t_env.execute_sql(f"""
    INSERT INTO {OUTPUT_TABLE}
    SELECT * 
    FROM {INPUT_TABLE}
""")

我使用的是 Flink 1.11.3 和 flink-s3-fs-hadoop 1.11.3。我已将 flink-s3-fs-hadoop-1.11.3.jar 复制到 plugins 文件夹中。

cp /opt/flink/lib/flink-s3-fs-hadoop-1.11.3.jar /opt/flink/plugins/s3-fs-hadoop/;

我还在 flink-conf.yaml 中添加了以下配置。

    state.backend: filesystem
    state.checkpoints.dir: s3://test-bucket/checkpoints/
    s3.endpoint: http://127.0.0.1:9000
    s3.path.style.access: true
    s3.access-key: minio
    s3.secret-key: minio123

MinIo 运行正常,我在 MinIo 中创建了“test-bucket”。当我运行这个作业时,作业提交不会发生,Flink Dashboard 进入某种等待状态。 15-20 分钟后,我收到以下异常,

pyflink.util.exceptions.TableException: Failed to execute sql

这里似乎有什么问题?

0 个答案:

没有答案