Datastax Graph Loader - 加载非统一的JSON文件'元属性

时间:2017-10-13 22:06:02

标签: json groovy datastax datastax-startup datastax-enterprise-graph

以下是3个示例JSON文件和图形加载器脚本。第一个文件包含最复杂的,其中大部分应该被加载脚本忽略。第二个文件是经常发生的简单变体。最后一个文件用于提供每个文件之间可能出现的广泛差异的感觉,并显示当前问题的最直接示例。

在深入研究之前,请注意,这只是我实际使用的数据结构的近似值,以及它的加载脚本。有更好的方法来处理人们的顶点,但这是我能想到的第一个例子。

示例输入JSON文件1

/*{
  "peopleInfo": [
    {
      "id": {
        "idProperty1": "property1Value",
        "idProperty2": "someUUID"
      }
    },
    {
      "people": [
        {
          "firstName": "person1FirstName",
          "lastName": "person1LastName",
          "sequence": 1
        },
        {
          "firstName": "person2FirstName",
          "lastName": "person2LastName",
          "sequence": 2
        },
        { //children and twins may be switched such that twins are sequence 3 & 4 and one or both of them have children with corresponding sequences
          "children": [
            {
              "firstName": "firstChildFirstName",
              "lastName": "firstChildLastName",
              "sequence": 3
            },
            {
              "firstName": "secondChildFirstName",
              "lastName": "secondChildLastName",
              "sequence": 4
            },
            {
              "twins": [
                {
                  "firstName": "firstTwinFirstName",
                  "lastName": "firstTwinLastName",
                  "sequence": 5
                },
                {
                  "firstName": "secondTwinFirstName",
                  "lastName": "secondTwinLastName",
                  "sequence": 6
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}*/

第二个文件不包含任何子项

示例输入JSON文件2

/*{
  "peopleInfo": [
    {
      "id": {
        "idProperty1": "property1Value",
        "idProperty2": "someUUID"
      }
    },
    {
      "people": [
        {
          "firstName": "person1FirstName",
          "lastName": "person1LastName",
          "sequence": 1
        },
        {
          "firstName": "person2FirstName",
          "lastName": "person2LastName",
          "sequence": 2
        }
      ]
    }
  ]
}*/

第三个文件包含Twins,但没有单身孩子

示例输入JSON文件3

    /*{
      "peopleInfo": [
        {
          "personsID": {
            "idProperty1": "property1Value",
            "idProperty2": "someUUID"
          }
        },
        {
          "people": [
            { // twins can exist without top level people(parents work well to define this) and without other children. Also, children can exist without twins and without parents as well.
              "twins": [
                {
                  "firstName": "firstTwinFirstName",
                  "lastName": "firstTwinLastName",
                  "sequence": 3
                },
                {
                  "firstName": "secondTwinFirstName",
                  "lastName": "secondTwinLastName",
                  "sequence": 4
                }
              ]
            }
          ]
        }
      ]
    }*/

加载脚本

inputBaseDir = "/path/to/directories"

import java.io.File as javaFile;
def list = []

new javaFile(inputBaseDir).eachDir() { dir ->
  list << dir.getAbsolutePath()
}
for (item in list){
  def fileBuilder = File.directory(item)
  def peopleInfoMapper = fileBuilder.map {
    it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
    it['idProperty2'] = it.peopleInfo.id.idProperty2[0]

    def ppl = it.peopleInfo.people[1]
    people = ppl.collect{
      if ( it['firstName'] != null){
        it['firstName'] = it['firstName']
      } else if ( it['lastName'] != null){
        it['lastName'] = it['lastName']
      } else if ( it['sequence'] != null) {
        it['sequence'] = it['sequence']
      }

      //filling the null values below is the temporary non-solution to get the data to load
      if ( it['firstName'] == null){
        it['firstName'] = ''
      }
      if ( it['lastName'] == null){
        it['lastName'] = ''
      }
      if ( it['sequence'] == null){
        it['sequence'] = 0
      }
      it
    }
    it['people'] = people
    it.remove('peopleInfo')
    it
    }
  load(peopleInfoMapper).asVertices {
    label "peopleInfo"
    key 'idProperty2'
    vertexProperty 'people',{
      value 'firstName'
      value 'lastName'
      value 'sequence'
      ignore 'children'
      ignore 'twins'
    }
  }

问题

1

查看第三个文件: 虽然双胞胎在其中具有允许的值,但它们不应该影响负载,因为忽略了双胞胎&#39; key应该忽略所有的元属性值。在这种情况下,我相信下面的例外情况正在被抛出,因为没有任何顶级人物不是孩子或双胞胎,而是忽略了双胞胎。密钥vertexProperty 'people'左边的所有内容都是一张空地图。我的非答案只是填充了空白地图,其中一个空字符串用于名称,一个零用于序列,这些序列与实际数据一起加载到数据库中。

  

java.lang.IllegalArgumentException:[On field&#39; people&#39;]提供地图   字段[sequence]上不包含属性值:   {双= [{的firstName = firstTwinFirstName,姓氏= firstTwinLastName,   序列= 1},{的firstName = secondTwinFirstName,姓氏= secondTwinLastName,序列= 2}]}

2

查看第一个文件: 当双胞胎&#39;键被忽略或直接删除,空映射仍然作为占位符保留,由加载脚本中的相同非解决方案填充并与实际数据一起加载到数据库中。

处理这些问题是否有最佳做法?

1 个答案:

答案 0 :(得分:0)

我不知道这是否是最流行的解决方案,但这似乎可以解决问题

inputBaseDir = "/path/to/directories"

import java.io.File as javaFile;
def list = []

new javaFile(inputBaseDir).eachDir() { dir ->
  list << dir.getAbsolutePath()
}
for (item in list){
  def fileBuilder = File.directory(item)
  def peopleInfoMapper = fileBuilder.map {
    it['idProperty1'] = it.peopleInfo.id.idProperty1[0]
    it['idProperty2'] = it.peopleInfo.id.idProperty2[0]

    def ppl = it.peopleInfo.people[1]
    people = ppl.collect{
      //removes k:v leaving an empty map
      if (it['children'] != null{
        it.remove('children')
      }
      //removes k:v leaving an empty map
      if (it['twins'] != null{
        it.remove('twins')
      }
      if ( it['firstName'] != null){
        it['firstName'] = it['firstName']
      } else if ( it['lastName'] != null){
        it['lastName'] = it['lastName']
      } else if ( it['sequence'] != null) {
        it['sequence'] = it['sequence']
      }
    }
    if (ppl['firstName'][0] != null && ppl['lastName'][0] != null){
      it['people'] = people.findAll() //only gathers non-empty maps from people
    } else { 
        /* removing people without desired meta-properties enables
         loader to proceed when empty maps from the removal of
         children and/or twins are present, while top-level 
         persons aren't*/
        it.remove('people')}  
    it.remove('peopleInfo')
    it
    }
  load(peopleInfoMapper).asVertices {
    label "peopleInfo"
    key 'idProperty2'
    vertexProperty 'people',{
      value 'firstName'
      value 'lastName'
      value 'sequence'
      ignore 'children'
      ignore 'twins'
    }
  }