从RDBMS导入到OrientDB:仅加载边

时间:2016-02-29 12:55:59

标签: import etl rdbms orientdb graphdb

使用Community Edition 2.1.11

我在互联网上看到了一些类似的问题(例如,import edges to OrientDB using etl或orient-database.narkive.com/d8c4b82y/orientdb-etl-edge-creation-help),但它们尚未真正得到解决。

我正在实施航班连接搜索系统。我有RDBMS(SQL Server)和两个相关的表 - 位置和航班。每个航班都有两个locationID - locationFrom和locationTo。

当我将其导入图表时,我希望将位置视为顶点,并将航班与边缘相关联。正如我从手册中所理解的那样(从DBMS导入,由于新手限制,我不能发布两个以上的链接......),我应该为此目的编写两个不同的JSON并由ETL运行它们。因此,我可以使用此代码导入位置而不会出现任何问题:

    {
  "config": {
    log : "debug"
  },
  "extractor" : {
    "jdbc": { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
              "url": "jdbc:sqlserver://localhost:1434;databaseName=mydb;integratedSecurity=true;",
              "userName": "root",
              "userPassword": "root",
              "query": "select * from locations" }
  },

  "transformers" : [
    { "vertex": { "class": "Location"} }
  ],
   "loader" : {
    "orientdb": {
      "dbURL": "plocal:C:\orientdb-community-2.1.11\databases\Test",
      dbUser: "admin",
      dbPassword: "admin",
      dbAutoDropIfExists: false,
      dbAutoCreate: true,
      tx: false,
      wal: false,
      batchCommit: 1000,
      dbType: "graph",
      indexes: [{class:"Location", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
    }
  }
}

但是当我尝试导入航班时,我遇到了一个问题,即使使用谷歌的帮助我也无法解决:ETL不想只导入边缘。作为第一个直观的目的,我写了类似的东西:

{
  "config": {
    log : "debug"
  },
  "extractor" : {
    "jdbc": { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
              "url": "jdbc:sqlserver://localhost:1434;databaseName=mydb;integratedSecurity=true;",
              "userName": "root",
              "userPassword": "root",
              "query": "select * from flights" }
  },

  "transformers" : [
    { "edge": { "class": "flight", "direction" : "out", 
            "joinFieldName": "flightFromLocation",
            "lookup":"locationID", "unresolvedLinkAction":"CREATE"}

            { "class": "flight", "direction" : "in", 
            "joinFieldName": "flightToLocation",
            "lookup":"locationID", "unresolvedLinkAction":"CREATE"}
    }
  ],
   "loader" : {
    "orientdb": {
      "dbURL": "plocal:C:\orientdb-community-2.1.11\databases\Test",
      dbUser: "admin",
      dbPassword: "admin",
      dbAutoDropIfExists: false,
      dbAutoCreate: true,
      tx: false,
      wal: false,
      batchCommit: 1000,
      dbType: "graph",
      indexes: [{class:"flight", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
    }
  }
}

在OrientDB的GoogleGroups的一个主题中,我发现了一个来自OrientDB的post from Luca,它表示只能通过ETL加载边缘,但我仍然无法弄清楚,如何实现它:(只有想法,我有两天的阅读文档和谷歌搜索是将它们作为顶点导入,然后编写一些控制台JS函数,将创建具有相同属性的正确边... ...

或许我错过了一些非常基本的东西?我对东方来说是全新的......

2 个答案:

答案 0 :(得分:1)

执行所需操作的简单方法是使用正常的ETL过程将两个表导入两个顶点类,然后使用js函数创建边。

我创建此数据集以在导入两个表后重新创建您的情况:

locations flights_V

这是JS函数:

  

参数:flights_V_class,edge_class,location_class

var g=orient.getGraphNoTx();


var flightsV_table = g.command("sql","select from " + flights_V_class);

for(i=0; i < flightsV_table.length; i++){

  var id_from = flightsV_table[i].getProperty("locationFrom");

  var id_to = flightsV_table[i].getProperty("locationTo");

  var select_from = "select from "+location_class+" where id = "+id_from;
  var select_to = "select from "+location_class+" where id = "+id_to;

  g.command("sql","create edge " + edge_class + " from (" + select_from + ") to (" + select_to + ")");
}

执行此功能后,这是我的数据: locations_EE flights_E

然后,毕竟,你可以删除临时flight_V类。

希望它有所帮助。 再见。

伊万

答案 1 :(得分:0)

我试过MySQL

我创建了位置和航班

enter code here

Location.json

{
  "config": {
    log : "debug"
  },
  "extractor" : {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
              "url": "jdbc:mysql://localhost:3306/flights",
              "userName": "user",
              "userPassword": "password",
              "query": "select * from Location" 
              }
  },
  "transformers" : [
    { "vertex": { "class": "Location"} }
  ],
   "loader" : {
    "orientdb": {
      "dbURL": "yourPath",
      "dbUser": "admin",
      "dbPassword": "admin",
      "dbAutoDropIfExists": false,
      "dbAutoCreate": true,
      "tx": false,
      "wal": false,
      "batchCommit": 1000,
      "dbType": "graph",
      "indexes": [{class:"Location", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
    }
  }
}

Flight.json

{
  "config": {
    log : "debug"
  },
  "extractor" : {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
              "url": "jdbc:mysql://localhost:3306/flights",
              "userName": "user",
              "userPassword": "password",
              "query": "select * from flight" 
              }
  },
  "transformers" : [
    { "vertex": { "class": "Fligth"} }
  ],
   "loader" : {
    "orientdb": {
      "dbURL": "yourPath",
      "dbUser": "admin",
      "dbPassword": "admin",
      "dbAutoDropIfExists": false,
      "dbAutoCreate": true,
      "tx": false,
      "wal": false,
      "batchCommit": 1000,
      "dbType": "graph",
      "indexes": [{class:"flight", fields:["id:string"], type:"UNIQUE_HASH_INDEX" }]
    }
  }
}

etl进程导入了以下记录

enter image description here

您可以使用此JavaScript函数

var g=orient.getGraphNoTx();
g.command("sql","CREATE CLASS Fligth2 EXTENDS E");
var fligth = g.command("sql","select from Fligth");
for(i=0;i<fligth.length;i++){
    var idFrom=fligth[i].getProperty("idFrom");
    var idTo=fligth[i].getProperty("idTo");
    var name=fligth[i].getProperty("name");
    print(name);
    var from=g.command("sql","select from Location where id = " +  idFrom);
    var to=g.command("sql","select from Location where id = " +  idTo);
    g.command("sql","create edge Fligth2 from " + from[0].getId() + " to " + to[0].getId() + " set name = '" + name + "'");
}
g.command("sql","drop class Fligth unsafe");
g.command("sql","UPDATE Location REMOVE id");

你应该有这个结构

enter image description here

enter image description here