OrientDB缓慢创建边缘ETL

时间:2016-05-31 12:29:30

标签: performance etl orientdb graph-databases edge

我是OrientDB的新手,对Neo4J有一点经验,在尝试使用OETL.BAT工具加载和创建Edges时遇到了性能问题。我需要在节点之间创建~440万个边缘(大约4200万,并非在etl的这个阶段都使用了所有边缘)。已经加载了“客户”节点,并且我正在加载的边缘列表非常简单(如下所示)并且只是具有源和节点。每条边的目的地ID,其目的是模拟客户之间的付款。

根据etl工具,目前我的吞吐量达到每秒23-30次。我使用了一个CSV文件,而不是与我的RDBMS的JDBC连接,而且我也处于“plocal”模式。

有没有更快的方法呢?或者我可能采取了错误的做法?

客户 - 顶点 CISNumber,名称

PAID - Edge SourceCISNumber,DestCISNumber,Amount,TransactionCount

提前致谢

{
 "source": { "file": { "path": "/datafiles/PersonalCustomers/Edges.csv" } },
  "extractor": { "row": {} },
 "transformers": [
    {"csv": {} }, 
    {"merge": {"joinFieldName": "SourceCISNumber", "lookup": "Customer.CISNumber"} },
    {"vertex": {"class": "Customer", "skipDuplicates": true} },
    { "edge": 
        { 
            "class": "PAID",
            "joinFieldName": "DestCISNumber",
            "lookup": "Customer.CISNumber",
            "unresolvedLinkAction": "SKIP",
            "edgeFields":
                {
                    "Volume": "${input.Transactioncount}", 
                    "Value": "${input.Amount}"
                }
        }
    },
    {"field": {"fieldNames": ["SourceCISNumber", "DestCISNumber", "Transactioncount", "Amount"], "operation": "remove" } }
  ],
   "loader": {
    "orientdb": {
       "dbURL": "plocal:/orientdb/databases/Customers",
       "dbType": "graph",
       "batchCommit": 500,
       "useLightweightEdges" : true,
       "classes": [
         {"name": "PAID", "extends": "E"},
       ]
    },
    "indexes": [
         {"class":"Customer", "fields":["CISNumber:long"] }
       ]
  }
}

1 个答案:

答案 0 :(得分:0)

您应该在“loader”中输入“batchCommit”:1000。 也是“并行”:在“config”

中为true