使用ETL将JSON导入OrientDB类型文档

时间:2016-03-23 22:30:30

标签: json import etl orientdb

如何将一些json文件导入到OrientDB中,以便像文档类型(而不是图形)一样使用它?

我的数据是这样的:

    {
    "p_partkey": 1,
    "p_name": "lace spring",
    "lineorder": [{
        "customer": [{
            "c_name": "Customer#000014704"
        }],
        "lo_quantity": 49,
        "lo_orderpriority": "1-URGENT",
        "lo_discount": 3,
        "lo_shipmode": "RAIL|",
        "lo_tax": 0
    }, {
        "customer": [{
            "c_name": "Customer#000026548"
        }],
        "lo_quantity": 15,
        "lo_orderpriority": "3-MEDIUM",
        "lo_discount": 10,
        "lo_shipmode": "SHIP|",
        "lo_tax": 0
    }]
}

我创建一个configfile.json,如下导入,但它不起作用:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/home/raphael/Documents/data/part/part1.json", "lock" : true }
  },
  "extractor" : {
    "json": {}
  },
  "transformers" : [
   { "merge": { "joinFieldName":"p_partkey"} },
   { "vertex": { "class": "part"} }
  ],
  "loader" : {
    "orientdb": {
      "dbURL": "plocal:/opt/orientdb/databases/part",
      "dbUser": "root",
      "dbPassword": "rasns1901",
      "dbAutoCreate": true,
      "tx": false,
      "batchCommit": 1000,
      "dbType": "document",
      "classes": [
        {"name": "part", "extends": "V"}
      ],      
      "indexes": [
        {"class":"part", "fields":["p_partkey:integer"], "type":"UNIQUE_HASH_INDEX" }
      ]
    }
  }
}

我的配置文件有问题吗?在OrientDB文档中没有它的例子。

1 个答案:

答案 0 :(得分:0)

我放弃了使用ETL并使用python做了它,它更容易。

这是我的代码:

from __future__ import division
import csv
import sys
import collections
import pyorient

def Inicio():


    db_name = "db"
    client = pyorient.OrientDB("127.0.0.1", 2424)
    session_id = client.connect( "admin", "admin" )
    client.db_open( db_name, "admin", "admin" )
    i=1
    while i<3:
        file= open('home/Desktop/part'+str(i)+'.json','rd')
        texto = file.readline()
        co = 'INSERT INTO part CONTENT '+texto 
        client.command(co)
        print("Inserted:"+str(i))
        file.close()
        i=i+1
    client.db_close()

Inicio()

你唯一需要注意的是我的json文件没有回车符,所以readline()函数有效。