NiFi无法解析转换记录中的数据

时间:2019-09-13 17:32:59

标签: json apache-nifi avro

我正在尝试使用result处理器将JSON转换为CSV,但我得到的唯一错误是ConvertRecord。由于这不是很说明性,因此我对如何诊断问题一无所知。

我知道我的Avro模式是有效的,因为A)NiFi在将其插入到模式注册表中时不会抛出与模式有关的错误,并且B)我在here上测试了我的模式,但没有给我一个问题。

我也知道我的JSON是有效的,因为我可以使用Could not parse incoming data在Python中加载它,并且不会给我带来任何问题。

我只是不太确定自己哪里出了问题,也不知道如何解决。

JSON

json.loads()

Avro

{
  "DOC": {
    "DOCID": "1234",
    "Subjects": {
      "Subject_xref": ["2233"]
    },
    "TXT": {
      "COUNTRY": ["United States"],
      "ESTATE": ["Mount Vernon"],
      "PERSON": ["George Washington"]
    },
    "RAW_TXT": "George Washington lived in his family home, Mount Vernon, located in the United States.",
    "RELINFO": [
      {"ID" : "REL-1234-100",
      "RELTYPE" : "PER-PROP",
      "PERID" : "PER-1234-009",
      "PROPID" : "PROP-1234-001",
      "SENTID" : "1234-SENT-001",
      "PROP_NORM" : "Mount Vernon",
      "PROP_MENTION" : "Mount Vernon",
      "PER_NORM" : "George Washington",
      "PER_MENTION" : "George Washington"}
    ],
    "ENTINFO": [
      {"ID": "PER-1234-009", "TYPE": "PERSON", "NORM": "George Washington", "REFID": "PER-1234-009", "MENTION": "George Washington"},
      {"ID": "CTRY-1234-003", "TYPE": "COUNTRY", "NORM": "United States", "REFID": "CTRY-1234-003", "MENTION": "United States."},
      {"ID": "PROP-1234-001", "TYPE": "ESTATE", "NORM": "Mount Vernon", "REFID": "PROP-1234-001", "MENTION": "Mount Vernon"}
    ]
  }
}

1 个答案:

答案 0 :(得分:1)

您的架构与您的JSON不匹配。您已将SubjectIdentificationID定义为longnull,但在JSON Subject_xref中是一个数组。

{
  "type": "record",
  "namespace": "name.space",
  "name": "nlp_output",
  "fields": [
    {"name": "DOC", "type": {
      "name": "DOCDocument", "type": "record", "namespace": "doc.name.space", "fields": [
        {"name": "DOCID", "type": ["long","null"], "default": null},
        {"name": "Subjects", "type": {
          "name": "Subjects", "type": "record", "namespace": "subjects.name.space", "fields": [
            {"name": "SubjectIdentificationID", "aliases": ["Subject_xref"], "type": {"type": "array", "items": ["long", "null"]}, "default": null}
          ]
        }},
        {"name": "TXT", "type": {
          "name": "TXT", "type": "record", "namespace": "text.name.space", "fields": [
            {"name": "COUNTRY", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""},
            {"name": "ESTATE", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""},
            {"name": "PERSON", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""}
          ]
        }},
        {"name": "RAW_TXT", "type": ["string","null"], "default": null},
        {"name": "RELINFO", "type": {
          "name": "RelatedEntities", "type": "record", "namespace": "relent.name.space", "fields": [
            {"name": "ID", "type": ["string", "null"], "default": null},
            {"name": "RELTYPE", "type": ["string", "null"], "default": null},
            {"name": "PERID", "type": ["string", "null"], "default": null},
            {"name": "PROPID", "type": ["string", "null"], "default": null},
            {"name": "SENTID", "type": ["string", "null"], "default": null},
            {"name": "PROP_NORM", "type": ["string", "null"], "default": null},
            {"name": "PROP_MENTION", "type": ["string", "null"], "default": null},
            {"name": "PER_NORM", "type": ["string", "null"], "default": null},
            {"name": "PER_MENTION", "type": ["string", "null"], "default": null}
          ]
        }},
        {"name": "ENTINFO", "doc": "Sentences stripped of tags for ease of reading", "type": {
          "name": "Entities", "type": "record", "namespace": "entities.name.space", "fields": [
            {"name": "ID", "type": ["string", "null"], "default": null},
            {"name": "TYPE", "type": ["string", "null"], "default": null},
            {"name": "NORM", "type": ["string", "null"], "default": null},
            {"name": "REFID", "type": ["string", "null"], "default": null},
            {"name": "MENTION", "type": ["string", "null"], "default": null}
          ]
        }}
      ]
    }}
  ]
}