我想将两个csv文件导入到Orientdb数据库中。第一个是顶点,有100万条记录。第二个是有5900万条记录的边缘
我要导入两个json文件:
顶点
{
"source": { "file": { "path": "../csvs/metodo01/pesquisador.csv" } },
"extractor": { "row": {} },
"transformers": [
{ "csv": {} },
{ "vertex": { "class": "Pesquisador" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/dbCemMilM01",
"dbType": "graph",
"batchCommit": 1000,
"classes": [
{"name": "Pesquisador", "extends": "V"}
], "indexes": [
{"class":"Pesquisador", "fields":["psq_id:integer"], "type":"UNIQUE" }
]
}
}
}
边缘
{
"config": {
"log": "info",
"parallel": false
},
"source": {
"file": {
"path": "../csvs/metodo01/a10.csv"
}
},
"extractor": {
"row": {
}
},
"transformers": [{
"csv": {
"separator": ",",
"columnsOnFirstLine": true,
"columns": ["psq_id_from:integer",
"pub_id_to:integer",
"ordem:integer"]
}
},
{
"command": {
"command": "create edge PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to (select from Publicacao where pub_id = ${input.pub_id_to}) set ordem = ${input.ordem} ",
"output": "edge"
}
}],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/dbUmMilhaoM01",
"dbType": "graph",
"standardElementConstraints": false,
"batchCommit": 1000,
"classes": [{
"name": "PUBLICOU",
"extends": "E"
}]
}
}
}
在此过程中,Orientdb建议使用索引来加速该过程。
我该怎么做?
命令是创建边缘PUBLICOU(从Pesquisador中选择psq_id = $ {input.psq_id_from})到(从Publicacao中选择pub_id = $ {input.pub_id_to})设置ordem = $ {input.ordem} < / p>
答案 0 :(得分:0)
要加快创建边缘流程,您可能需要已经拥有的Pesquisador.psq_id
属性以及Publicacao.pub_id
上的索引。
伊万
答案 1 :(得分:0)
您可以直接在ETL配置中声明索引。从DBPedia导入器获取的示例:
"orientdb": {
"dbURL": "plocal:/temp/databases/dbpedia",
"dbUser": "importer",
"dbPassword": "IMP",
"dbAutoCreate": true,
"tx": false,
"batchCommit": 1000,
"wal" : false,
"dbType": "graph",
"classes": [
{"name":"Person", "extends": "V" },
{"name":"Customer", "extends": "Person", "clusters":8 }
],
"indexes": [
{"class":"V", "fields":["URI:string"], "type":"UNIQUE" },
{"class":"Person", "fields":["town:string"], "type":"NOTUNIQUE" ,
metadata : { "ignoreNullValues" : false }
}
]
}
有关详细信息,请查看:http://orientdb.com/docs/2.2/Loader.html
答案 2 :(得分:0)
为了加速加载过程,我的建议是在plocal模式下工作,然后将创建的db模式化为独立的OrientDB服务器。