我是OrientDB的新手,对Neo4J有一点经验,在尝试使用OETL.BAT工具加载和创建Edges时遇到了性能问题。我需要在节点之间创建~440万个边缘(大约4200万,并非在etl的这个阶段都使用了所有边缘)。已经加载了“客户”节点,并且我正在加载的边缘列表非常简单(如下所示)并且只是具有源和节点。每条边的目的地ID,其目的是模拟客户之间的付款。
根据etl工具,目前我的吞吐量达到每秒23-30次。我使用了一个CSV文件,而不是与我的RDBMS的JDBC连接,而且我也处于“plocal”模式。
有没有更快的方法呢?或者我可能采取了错误的做法?
客户 - 顶点 CISNumber,名称
PAID - Edge SourceCISNumber,DestCISNumber,Amount,TransactionCount
提前致谢
{
"source": { "file": { "path": "/datafiles/PersonalCustomers/Edges.csv" } },
"extractor": { "row": {} },
"transformers": [
{"csv": {} },
{"merge": {"joinFieldName": "SourceCISNumber", "lookup": "Customer.CISNumber"} },
{"vertex": {"class": "Customer", "skipDuplicates": true} },
{ "edge":
{
"class": "PAID",
"joinFieldName": "DestCISNumber",
"lookup": "Customer.CISNumber",
"unresolvedLinkAction": "SKIP",
"edgeFields":
{
"Volume": "${input.Transactioncount}",
"Value": "${input.Amount}"
}
}
},
{"field": {"fieldNames": ["SourceCISNumber", "DestCISNumber", "Transactioncount", "Amount"], "operation": "remove" } }
],
"loader": {
"orientdb": {
"dbURL": "plocal:/orientdb/databases/Customers",
"dbType": "graph",
"batchCommit": 500,
"useLightweightEdges" : true,
"classes": [
{"name": "PAID", "extends": "E"},
]
},
"indexes": [
{"class":"Customer", "fields":["CISNumber:long"] }
]
}
}
答案 0 :(得分:0)
您应该在“loader”中输入“batchCommit”:1000。 也是“并行”:在“config”
中为true