Jolt引用数组中的第一个元素作为目标名称

时间:2017-09-30 10:52:36

标签: json jolt apache-nifi

我已经看了几个星期(在后台),并且很难理解如何使用NiFi JoltTransformJson处理器将近似CSV的JSON数据转换为标记集。我的意思是使用输入中数组第一行的数据作为输出中的JSON对象名。

作为一个例子,我有这个输入数据:

[
  [
    "Company",
    "Retail Cost",
    "Percentage"
  ],
  [
    "ABC",
    "5,368.11",
    "17.09%"
  ],
  [
    "DEF",
    "101.47",
    "0.32%"
  ],
  [
    "GHI",
    "83.79",
    "0.27%"
  ]
]

我想要输出的是:

[
  {
    "Company": "ABC",
    "Retail Cost": "5,368.11",
    "Percentage": "17.09%"
  },
  {
    "Company": "DEF",
    "Retail Cost": "101.47",
    "Percentage": "0.32%"
  },
  {
    "Company": "GHI",
    "Retail Cost": "83.79",
    "Percentage": "0.27%"
  }
]

我认为这主要是两个问题:访问第一个数组的内容,然后确保输出数据不包含第一个数组。

我希望发布一个Jolt规范,显示我自己有点接近,但最接近的是给我正确的输出形状而没有正确的内容。它看起来像这样:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "*": "[&1].&0"
      }
    }
  }
]

但它会产生如下输出:

[ {
  "0" : "Company",
  "1" : "Retail Cost",
  "2" : "Percentage"
}, {
  "0" : "ABC",
  "1" : "5,368.11",
  "2" : "17.09%"
}, {
  "0" : "DEF",
  "1" : "101.47",
  "2" : "0.32%"
}, {
  "0" : "GHI",
  "1" : "83.79",
  "2" : "0.27%"
} ]

显然有错误的对象名称,输出中有太多元素。

1 个答案:

答案 0 :(得分:2)

可以做到,但哇难以阅读/看起来像可怕的正则表达式

规格

[
  {
    // this does most of the work, but producs an output
    //  array with a null in the Zeroth space.
    "operation": "shift",
    "spec": {
      // match the first item in the outer array and do 
      //  nothing with it, because it is just "header" data
      //   e.g. "Company", "Retail Cost", "Percentage".
      // we need to reference it, but not pass it thru
      "0": null,
      // 
      // loop over all the rest of the items in the outer array
      "*": {
        // this is rather confusing
        // "*" means match the array indices of the innner array
        // and we will write the value at that index "ABC" etc
        // to "[&1].@(2,[0].[&])"
        // "[&1]" means make the ouput be an array, and at index
        //   &1, which is the index of the outer array we are
        //   currently in.
        // Then "lookup the key" (Company, Retail Cost) using
        //  @(2,[0].[&])
        // Which is go back up the tree to the root, then 
        //  come back down into the first item of the outer array
        //  and Index it by the by the array index of the current
        //  inner array that we are at.
        "*": "[&1].@(2,[0].[&])"
      }
    }
  },
  {
    // We know the first item in the array will be null / junk,
    //  because the first item in the input array was "header" info.
    // So we match the first item, and then accumulate everything
    //  into a new array
    "operation": "shift",
    "spec": {
      "0": null,
      "*": "[]"
    }
  }
]