根据Avro Debezium数据创建基于Avro的KSQL流会生成奇怪的模式

时间:2019-03-07 19:50:47

标签: apache-kafka ksql debezium

我正在使用一个Avro值转换,该转换会生成如下所示的模式(这是一个很大的子集,因为它是如此之大)

{
  "type": "record",
  "name": "Envelope",
  "namespace": "mssql.dbo.InvTR_T",
  "fields": [
    {
      "name": "before",
      "type": [
        "null",
        {
          "type": "record",
          "name": "Value",
          "fields": [
            {
              "name": "InvTR_ID",
              "type": "int"
            },
            {
              "name": "Type_CH",
              "type": "string"
            },
            {
              "name": "CalcType_CH",
              "type": "string"
            },
            {
              "name": "ER_CST_ID",
              "type": "int"
            },
            {
              "name": "ER_REQ_ID",
              "type": "int"
            },
            {
              "name": "Vendor_ID",
              "type": "int"
            },
            {
              "name": "VendInv_VC",
              "type": "string"
            },
            {
              "name": "Status_CH",
              "type": "string"
            },
            {
              "name": "Stage_TI",
              "type": {
                "type": "int",
                "connect.type": "int16"
              }
            },
            {
              "name": "CheckOut_ID",
              "type": [
                "null",
                "int"
              ],
              "default": null
            },
            {
              "name": "ReCalcCk_LG",
              "type": "boolean"
            },
            {
              "name": "ReCalcAll_LG",
              "type": "boolean"
            },
            {
              "name": "PatMatch_LG",
              "type": "boolean"
            },
            {
              "name": "DocPatOvRd_LG",
              "type": "boolean"
            },
            {
              "name": "Locked_LG",
              "type": [
                "null",
                "boolean"
              ],
              "default": null
            },
            {
              "name": "SegErrFlag_LG",
              "type": "boolean"
            },
            {
              "name": "Hold_LG",
              "type": "boolean"
            },
            {
              "name": "Reason_ID",
              "type": [
                "null",
                {
                  "type": "int",
                  "connect.type": "int16"
                }
              ],
              "default": null
            },
            {
              "name": "HoldCom_VC",
              "type": [
                "null",
                "string"
              ],
              "default": null
            },
            {
              "name": "AllSegFin_LG",
              "type": "boolean"
            },
            {
              "name": "InvAmt_MN",
              "type": {
                "type": "bytes",
                "scale": 4,
                "precision": 19,
                "connect.version": 1,
                "connect.parameters": {
                  "scale": "4",
                  "connect.decimal.precision": "19"
                },
                "connect.name": "org.apache.kafka.connect.data.Decimal",
                "logicalType": "decimal"
              }

当我运行以下命令以创建流时

CREATE STREAM stream_invtr_t_json   WITH (KAFKA_TOPIC='InvTR_T', VALUE_FORMAT='AVRO');

然后我描述该流,该模式采用非常奇怪的格式。我想使用KSQL来过滤掉特定信息并适当地分散这些事件。但是我不能从Kafka Topic => KSQL Stream => Kafka Topic => Sink出发。如果我随后在该信息流之外创建了一个新主题,并尝试将其摘要到接收器中,则会得到

Expected Envelope for transformation, passing it unchanged

,然后出现有关PK丢失的错误。我试图删除展开的转换,只是为了看看它会如何出现,并且在此上也收到错误。

BEFORE  | STRUCT<INVTR_ID INTEGER, TYPE_CH VARCHAR(STRING), CALCTYPE_CH VARCHAR(STRING), ER_CST_ID INTEGER, ER_REQ_ID INTEGER, VENDOR_ID INTEGER, VENDINV_VC VARCHAR(STRING), STATUS_CH VARCHAR(STRING), STAGE_TI INTEGER, CHECKOUT_ID INTEGER, RECALCCK_LG BOOLEAN, RECALCALL_LG BOOLEAN, PATMATCH_LG BOOLEAN, DOCPATOVRD_LG BOOLEAN, LOCKED_LG BOOLEAN, SEGERRFLAG_LG BOOLEAN, HOLD_LG BOOLEAN, REASON_ID INTEGER, HOLDCOM_VC VARCHAR(STRING), ALLSEGFIN_LG BOOLEAN, INVDATE_DT BIGINT, SHIPDATE_DT BIGINT, PDTERMS_CH VARCHAR(STRING), PMTDUE_DT BIGINT, PMTTERMS_VC VARCHAR(STRING), BILLTERMS_CH VARCHAR(STRING), JOINT_LG BOOLEAN, COMMENT_VC VARCHAR(STRING), SOURCE_CH VARCHAR(STRING), ADDBY_ID VARCHAR(STRING), ADDED_DT BIGINT, CHGBY_ID VARCHAR(STRING), CHGED_DT BIGINT, APPROVED_LG BOOLEAN, MULTIPO_VC VARCHAR(STRING), PRVAUDITED_INVTR_ID INTEGER, PRVVENDOR_ID INTEGER, TRANSITDAYS_SI INTEGER, SHIP_NUM_VC VARCHAR(STRING), PRVTRANSITDAYS_SI INTEGER, PRVJOINT_LG BOOLEAN, CLONEDFROM_INVTR_ID INTEGER, LASTCALC_DT BIGINT, TMSFMANUAL_LG BOOLEAN, FRTRATERSOURCE_CH VARCHAR(STRING), ACTPICKUP_DT BIGINT, ROUTVEND_SI INTEGER, CALCVRSN_TI INTEGER, VENDORRANK_SI INTEGER, SEQ_SI INTEGER, PRVAUDITED_DT BIGINT, FRTRATERBATCHTYPE_CH VARCHAR(STRING), CURRENCY_TYPE_CD VARCHAR(STRING), EXCHANGE_DT BIGINT, EXCHANGE_RATE_LOCKED_LG BOOLEAN, EXCHANGE_DT_LOCKED_LG BOOLEAN, CUSTAPPROVED_LG BOOLEAN, FRTRATERMATCH_INVTR_ID INTEGER, CRC_INVOICE_LG BOOLEAN, RG_ROUTVEND_SI INTEGER, RG_PRVVE

1 个答案:

答案 0 :(得分:0)

似乎有关UnwrapFromEnvelope的评论解决了部分问题。剩下的只是小数部分。

查看连接器的文档:https://debezium.io/documentation/reference/1.1/connectors/postgresql.html

正如Jiri所说,我可以看到有一个decimal.handling.mode设置。在其默认值precise下,它看起来将以ksqlDB可以识别的格式输出Avro十进制数,除非使用的源NUMERIC或DECIMAL类型没有任何小数位数。此时,您将获得STRUCT数据结构,其中包括BYTE字段。

此规则有一个例外。当使用NUMERIC或DECIMAL类型而没有任何比例尺约束时,则意味着来自数据库的值对于每个值都具有不同的(可变)比例尺。在这种情况下,将使用io.debezium.data.VariableScaleDecimal类型,该类型同时包含值和转移值的小数位数。

因此,要将数据导入到ksqlDB中,您需要:

  1. 等待直到我们支持BYTES数据类型(当前不在我们的路线图上)
  2. 更改源表的架构以定义列的比例。
  3. 将decimal.handling.mode更改为其他设置。您可能可以使用字符串,然后在ksql中将值CAST转换为十进制。