如何使用OrientDB ETL仅创建边

时间:2015-11-12 19:22:58

标签: etl orientdb graph-databases

我有两个CSV文件:

首先包含以下格式的~500M记录

  

ID,名称
  10000023432,Tom用户
  13943423235,Blah Person

第二个包含以下格式的〜1.5B朋友关系

  

fromId,TOID
  10000023432,13943423235

我使用OrientDB ETL工具从第一个CSV文件创建顶点。现在,我只需要创建边缘以建立它们之间的友谊连接。

到目前为止,我已经尝试过多次配置ETL json文件,最新的是这个:

{
    "config": {"parallel": true},
    "source": { "file": { "path": "path_to_file" } },
    "extractor": { "csv": {} },
    "transformers": [
        { "vertex": {"class": "Person", "skipDuplicates": true} },
        { "edge": { "class": "FriendsWith",
                    "joinFieldName": "from",
                    "lookup": "Person.id",
                    "unresolvedLinkAction": "SKIP",
                    "targetVertexFields":{
                        "id": "${input.to}"
                    },
                    "direction": "out"
                  }
        },
        { "code": { "language": "Javascript",
                    "code": "print('Current record: ' + record);  record;"}
        }
    ],
    "loader": {
        "orientdb": {
            "dbURL": "remote:<DB connection string>",
            "dbType": "graph",
            "classes": [
                {"name": "FriendsWith", "extends": "E"}
            ], "indexes": [
                {"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
            ]
        }
    }
}

但不幸的是,这也创造了&#34;来自&#34;和&#34;到&#34;财产,除了创造边缘。

当我尝试删除顶点变换器时,ETL过程会抛出错误:

Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
        at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
        at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
        at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
        ... 2 more

我在这里缺少什么?

2 个答案:

答案 0 :(得分:6)

您可以使用这些ETL变换器导入边缘:

"transformers": [
    { "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
    { "vertex": {"class": "Person", "skipDuplicates": true} },
    { "edge": { "class": "FriendsWith",
                "joinFieldName": "toId",
                "lookup": "Person.id",
                "direction": "out"
              }
    },
    { "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

“merge”转换器会将当前的csv行与相关的Person记录连接起来(这有点奇怪,但由于某种原因,需要将fromId与源人员联系起来。)

“field”转换器将删除合并部分添加的csv字段。您可以尝试导入而不使用“field”变换器来查看差异。

答案 1 :(得分:1)

使用Java API,您可以读取csv然后创建边

        String nomeYourDb = "nomeYourDb";
        OServerAdmin serverAdmin;
        try {
            serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
            if (serverAdmin.existsDatabase()) {
                OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
                String csvFile = "path_to_file";
                BufferedReader br = null;
                String line = "";
                String cvsSplitBy = "   ";   // your separator
                try {
                    br = new BufferedReader(new FileReader(csvFile));
                    int index=0;
                    while ((line = br.readLine()) != null) {
                        if(index==0){
                            index=1;
                        }
                        else{
                            String[] ids = line.split(cvsSplitBy);
                            String personFrom="(select from Person where id='"+ids[0]+"')";
                            String personTo="(select from Person where id='"+ids[1]+"')";
                            String query="create edge FriendsWith from "+personFrom+" to "+personTo;
                            g.command(new OCommandSQL(query)).execute();
                        }
                    }
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                } catch (IOException e) {
                    e.printStackTrace();
                }
                finally {
                if (br != null) {
                        br.close();
                }
            }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }