Kafka SQL(KSQL)流不适用于具有嵌套字段的JSON数据

时间:2018-04-25 18:57:20

标签: apache-kafka ksql

我尝试在Kafka主题的基础上在KSQL中创建Kafka流。我在Kafka主题中存储了如下的JSON记录。

$(this).parents('form').submit();

此数据已加载到Kafka主题中。

我在KSQL中创建了一个流,如下所示。

{
  "venue": {
    "venue_name": "HATCH",
    "lon": -71.18291,
    "lat": 42.36667,
    "venue_id": 22491322
  },
  "visibility": "public",
  "response": "yes",
  "guests": 0,
  "member": {
    "member_id": 237655942,
    "member_name": "Nts"
  },
  "rsvp_id": 1724941595,
  "mtime": 1524620970613,
  "event": {
    "event_name": "Intro to Soldering",
    "event_id": "250106100",
    "time": 1526853600000,
    "event_url": "https:\/\/www.meetup.com\/Makers-of-HATCH-Makerspace\/events\/250106100\/"
  },
  "group": {
    "group_topics": [
      {
        "urlkey": "quilting",
        "topic_name": "Quilting"
      },
      {
        "urlkey": "robotics",
        "topic_name": "Robotics"
      },
      {
        "urlkey": "sewing",
        "topic_name": "Sewing"
      },
      {
        "urlkey": "edtech",
        "topic_name": "Education & Technology"
      },
      {
        "urlkey": "craftswap",
        "topic_name": "Crafts"
      },
      {
        "urlkey": "diy",
        "topic_name": "DIY (Do It Yourself)"
      },
      {
        "urlkey": "hacking",
        "topic_name": "Hacking"
      },
      {
        "urlkey": "3d-modeling",
        "topic_name": "3D Modeling"
      },
      {
        "urlkey": "tools",
        "topic_name": "Tools"
      },
      {
        "urlkey": "arduino",
        "topic_name": "Arduino"
      },
      {
        "urlkey": "makers",
        "topic_name": "Makers"
      },
      {
        "urlkey": "makerspaces",
        "topic_name": "Makerspaces"
      },
      {
        "urlkey": "3d-printing",
        "topic_name": "3D Printing"
      },
      {
        "urlkey": "laser-cutting",
        "topic_name": "Laser Cutting"
      },
      {
        "urlkey": "scrapbook-die-cutting-machines",
        "topic_name": "Scrapbook die cutting machines."
      }
    ],
    "group_city": "Watertown",
    "group_country": "us",
    "group_id": 18457932,
    "group_name": "Makers of HATCH Makerspace",
    "group_lon": -71.18,
    "group_urlname": "Makers-of-HATCH-Makerspace",
    "group_state": "MA",
    "group_lat": 42.37
  }
}

我在group_info(kafka流中的最后一个字段)字段中看到null。 注意:Kafka不让我创建一个名为" group"的字段。因为它是一个关键字。因此将该字段命名为group_info。

CREATE STREAM meetup_rsvp_raw 
(  Venue varchar, 
   Visibility varchar, 
   Response varchar, 
   Guests integer, 
   Member varchar, 
   rsvp_id bigint, 
   mtime bigint, 
   event varchar, 
   group_info varchar 
) WITH (KAFKA_TOPIC='meetup-rsvp', VALUE_FORMAT='JSON');

不确定我做错了什么,但欢迎任何建议。

1 个答案:

答案 0 :(得分:1)

你是对的,'GROUP'是KSQL中的关键字。您在CREATE STREAM语句中重命名字段名称是行不通的,因为KSQL不知道您的group_info列是指group字段。< / p>

您可以使用列周围的引号来导入主题,(目前,引号中的标识符必须为大写,但这是一个错误),例如。

CREATE STREAM meetup_rsvp_raw 
(  venue varchar, 
   visibility varchar, 
   response varchar, 
   guests integer, 
   member varchar, 
   rsvp_id bigint, 
   mtime bigint, 
   event varchar, 
   "GROUP" varchar 
) WITH (KAFKA_TOPIC='meetup-rsvp', VALUE_FORMAT='JSON');

请注意,选择此字段时您还需要使用引号:

SELECT `GROUP` from meetup_rsvp_raw limit 5;

我创建了Github issue to track the lack of documentation in this area

告诉我们您是如何继续这样做的。

谢谢,

安迪