Openrefine使用Templating将JSON导出为记录

时间:2015-07-09 20:52:56

标签: etl openrefine

我最近几天一直在与Openrefine合作,试图找出如何将Google数据表导出为JSON文件。

我有以下要导出到JSON文件的数据。

id  first name  last name   friends first name  friends last name   family first name   family last name
1   James   Brown   Judy    Garland Mary    Brown
            John    Neverland   Marlene Brown
            Paul    Garland Judy    Brown
2   John    Buller  Amy Garland Francis Buller
            Peter   Flake   John    Buller
            Jules   Peter   Judy    Buller

我期待的JSON是:

    {
  "results": [
    {
      "id": 1,
      "firstName": "James",
      "lastName": "Brown",
      "has": {
        "friends": [
          {
            "firstName": "Judy",
            "lastName": "Garland"
          },
          {
            "firstName": "John",
            "lastName": "Neverland"
          },
          {
            "firstName": "Paul",
            "lastName": "Garland"
          }
        ],
        "family": [
          {
            "firstName": "Mary",
            "lastName": "Brown"
          },
          {
            "firstName": "Marlene",
            "lastName": "Brown"
          },
          {
            "firstName": "Judy",
            "lastName": "Brown"
          }
        ]
      }
    },
    {
      "id": 2,
      "firstName": "John",
      "lastName": "Buller",
      "has": {
        "friends": [
          {
            "firstName": "Amy",
            "lastName": "Garland"
          },
          {
            "firstName": "Peter",
            "lastName": "Flake"
          },
          {
            "firstName": "Jules",
            "lastName": "Peter"
          }
        ],
        "family": [
          {
            "firstName": "Francis",
            "lastName": "Buller"
          },
          {
            "firstName": "John",
            "lastName": "Buller"
          },
          {
            "firstName": "Judy",
            "lastName": "Buller"
          }
        ]
      }
    }
  ]
}

到目前为止,我尝试了几种方法:

1)使用excel-to-json但它仅限于单一嵌套,并且对列名有一些限制

2)使用Openrefine和模板工具,但我遇到了几个问题: - 虽然它们被检测为openrefine中的记录,但是您导出行而不是记录,因此它将导出6行到JSON,其中4行包含空数据 - 如果我尝试填充列,它将导出6行到JSON,其中4行重复,从而失去了人与他的家人和朋友之间的关系

我将非常感谢任何帮助,因为我试图导出大约150,000个此类型的记录,这些记录必须采用此JSON格式。

1 个答案:

答案 0 :(得分:2)

OpenRefine仅支持一级嵌套。您可能需要使用编程语言或ETL解决方案来使用嵌套元素。