我是orient-db的新手,所以我正在使用Orient-db中的航班搜索图数据库。我有数百万个真实航班数据,我创建了JSON
文件来导入csv
文件,但导入所有数百万个数据需要数小时。它每秒仅导入大约500行。
我使用etl导入csv文件。
这是我的json文件
{
"source": {
"file": {
"path": "C:/Users/sams/Desktop/OrientDB2/flights.csv"
}
},
"extractor": {
"csv": {}
},
"transformers": [
{
"vertex": {
"class": "Flight"
}
},
{
"edge":
{
"class": "Has_Flight",
"joinFieldName": "depart_airport_id",
"lookup": "Airport.airport_id",
"direction": "in"
}
},
{
"edge":
{
"class": "Flying_To",
"joinFieldName": "arrive_airport_id",
"lookup": "Airport.airport_id",
"direction": "out"
}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:C:/Users/sams/Desktop/OrientDB2/database/dataflight",
"dbType": "graph",
"dbAutoCreate": true,
"classes": [
{
"name": "Airport",
"extends": "V"
},
{
"name": "Flight",
"extends": "V"
},
{
"name": "Has_Flight",
"extends": "E"
},
{
"name": "Flying_To",
"extends": "E"
}
],
"indexes": [
{
"class": "Airport",
"fields": [
"airport_id:integer"
],
"type": "UNIQUE"
}
]
}
}
}
所以我的问题是,是否还有其他机制可以在Orient-db中导入大数据集?
提前致谢!
答案 0 :(得分:4)
你可以尝试禁用WAL,启用txLog和usebatching。
让我们试试:
"wal" = false
"batchCommit" = 1000
"txUseLog" = true
有关OrientDb加载程序的文档:http://orientdb.com/docs/2.1/Loader.html#orientdb
如果您找到可以改善表现的组合,请告诉我。