Orientdb - CSV导入 - 性能CSV导入边缘

时间:2016-07-19 23:47:22

标签: csv orientdb

我想将两个csv文件导入到Orientdb数据库中。第一个是顶点,有100万条记录。第二个是有5900万条记录的边缘

我要导入两个json文件:

顶点

{
  "source": { "file": { "path": "../csvs/metodo01/pesquisador.csv" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {} },
    { "vertex": { "class": "Pesquisador" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/dbCemMilM01", 
       "dbType": "graph",
       "batchCommit": 1000,
       "classes": [
         {"name": "Pesquisador", "extends": "V"}
       ], "indexes": [
         {"class":"Pesquisador", "fields":["psq_id:integer"], "type":"UNIQUE" }
       ]
    }
  }
}

边缘

{
    "config": {
        "log": "info",
            "parallel": false
    },
    "source": {
        "file": {
            "path": "../csvs/metodo01/a10.csv"
        }
    },
    "extractor": {
        "row": {
        }
    },
    "transformers": [{
        "csv": {
            "separator": ",",
            "columnsOnFirstLine": true,
            "columns": ["psq_id_from:integer",
            "pub_id_to:integer",
            "ordem:integer"]
        }
    },
    {
        "command": {
            "command": "create edge PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to   (select from Publicacao  where pub_id = ${input.pub_id_to}) set  ordem = ${input.ordem} ",
            "output": "edge"
        }
    }],
    "loader": {
        "orientdb": {
            "dbURL": "remote:localhost/dbUmMilhaoM01", 
            "dbType": "graph",
            "standardElementConstraints": false,
            "batchCommit": 1000,
            "classes": [{
                "name": "PUBLICOU",
                "extends": "E"
            }]
        }
    }
}

在此过程中,Orientdb建议使用索引来加速该过程。

我该怎么做?

命令是创建边缘PUBLICOU(从Pesquisador中选择psq_id = $ {input.psq_id_from})到(从Publicacao中选择pub_id = $ {input.pub_id_to})设置ordem = $ {input.ordem} < / p>

3 个答案:

答案 0 :(得分:0)

要加快创建边缘流程,您可能需要已经拥有的Pesquisador.psq_id属性以及Publicacao.pub_id上的索引。

伊万

答案 1 :(得分:0)

您可以直接在ETL配置中声明索引。从DBPedia导入器获取的示例:

"orientdb": {
  "dbURL": "plocal:/temp/databases/dbpedia",
  "dbUser": "importer",
  "dbPassword": "IMP",
  "dbAutoCreate": true,
  "tx": false,
  "batchCommit": 1000,
  "wal" : false,
  "dbType": "graph",
  "classes": [
    {"name":"Person", "extends": "V" },
    {"name":"Customer", "extends": "Person", "clusters":8 }
  ],
  "indexes": [
    {"class":"V", "fields":["URI:string"], "type":"UNIQUE" },
    {"class":"Person", "fields":["town:string"], "type":"NOTUNIQUE" ,
        metadata : { "ignoreNullValues" : false }
    }
  ]
}

有关详细信息,请查看:http://orientdb.com/docs/2.2/Loader.html

答案 2 :(得分:0)

为了加速加载过程,我的建议是在plocal模式下工作,然后将创建的db模式化为独立的OrientDB服务器。