OrientDB ETL加载CSV,顶点在一个文件中,边缘在另一个文件中

时间:2016-07-28 06:11:46

标签: orientdb orientdb2.2 orientdb-etl

我有一些数据在2个CSV文件中,一个包含顶点,另一个文件包含边缘在另一个文件中。我正在研究如何使用ETL来设置它,但是我已经很接近了但它还没有 - 它主要是有效的,但我的边缘有属性,我不确定它们是否正确加载。 This question很有帮助,但我仍然遗漏了一些东西......

这是我的数据:

vertices.csv

label,data,date
v01,0.1234,2015-01-01
v02,0.5678,2015-01-02
v03,0.9012,2015-01-03

edges.csv

u,v,weight,date
v01,v02,12.4,2015-06-17
v02,v03,17.9,2015-09-14

我使用这个导入我的顶点:

commonVertices.json

{
"begin": [ 
             { "let": { "name":       "$filePath",  
                        "expression": "$fileDirectory.append($fileName)" 
                      } 
             },
         ],
"config": { "log": "info"},
"source": { "file": { "path": "$filePath" } },
"extractor": { "csv": { "ignoreEmptyLines": true,
                        "nullValue": "N/A",
                        "dateFormat": "yyyy-mm-dd"
                      }
             },
"transformers": [
                    { "vertex": { "class": "myVertex" } },
                    { "code":   { "language": "Javascript",
                                  "code":     "print('    Current record: ' + record); record;" }
                    }
                ],
"loader": { "orientdb": {
            "dbURL": "plocal:my_orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [ { "name": "myVertex", "extends", "V" },
                       ],
            "indexes": []
            }
          }
}

vertices.json

{ "config": { "log":           "info",
              "fileDirectory": "./",
              "fileName":      "vertices.csv"
            }
}

commonEdges.json

{
    "begin": [
        { "let": { "name": "$filePath",
                   "expression": "$fileDirectory.append($fileName )"
                 }
        },
    ],

    "config": { "log": "info"
              },

    "source": { "file": { "path": "$filePath" } },

    "extractor": { "csv": { "ignoreEmptyLines": true,
                            "nullValue": "N/A",
                            "dateFormat": "yyyy-mm-dd"
                          }
                 },

    "transformers": [
            { "merge":  { "joinFieldName": "u", "lookup": "myVertex.label" } },
            { "edge":   { "class":         "myEdge",
                          "joinFieldName": "v",
                          "lookup":        "myVertex.label",
                          "direction":     "out",
                          "unresolvedLinkAction": "NOTHING"
                        }
            },
            { "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
        ],

    "loader": {
        "orientdb": {
            "dbURL": "plocal:my_orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "useLightweightEdges": false,
            "classes": [
                { "name": "myEdge",   "extends", "E" }
            ],
            "indexes": []
        }
    }
}

edges.json

{
    "config": {
        "log": "info",
        "fileDirectory": "./",
        "fileName": "edges.csv"
    }
}

我正在使用oetl.sh运行它:

$ oetl.sh vertices.json commonVertices.json
$ oetl.sh edges.json commonEdges.json

一切都在运行,但是当我查询边缘时......我是OrientDB的新手,所以也许是我的边缘有属性,但是当我查询边缘时,我看不到重量和日期字段:

orientdb {db=my_orientdb}> SELECT FROM myEdge
+----+-----+------+-----+-----+
|#   |@RID |@CLASS|out  |in   |
+----+-----+------+-----+-----+
|0   |#33:0|myEdge|#25:0|#26:0|
|1   |#34:0|myEdge|#26:0|#27:0|
+----+-----+------+-----+-----+

顶点表包含来自edges.csv的[weight]字段,[date]字段以奇怪的方式被破坏。这个月的日期从edge.csv文件被覆盖到了当天,这是不可取的,但对我来说奇怪的是月份本身并没有得到改变:

orientdb {db=my_orientdb}> SELECT FROM myVertex
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|#   |@RID |@CLASS  |data  |date               |label|weight|out_myEdge|in_myEdge|
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|0   |#25:0|myVertex|0.1234|2015-01-17 00:06:00|v01  |12.4  |[#33:0]   |         |
|1   |#26:0|myVertex|0.5678|2015-01-14 00:09:00|v02  |17.9  |[#34:0]   |[#33:0]  |
|2   |#27:0|myVertex|0.9012|2015-01-03 00:01:00|v03  |      |          |[#34:0]  |
+----+-----+--------+------+-------------------+-----+------+----------+---------+

我确定这可能是一个简单的调整,任何帮助都会很棒!

1 个答案:

答案 0 :(得分:6)

在边缘变换器中,使用 edgeFields 来绑定边缘中的属性。例如:

 "transformers": [
            { "merge":  { "joinFieldName": "u", "lookup": "myVertex.label" } },
            { "edge":   { "class":         "myEdge",
                          "joinFieldName": "v",
                          "lookup":        "myVertex.label",
                          "edgeFields": { "weight": "${input.weight}", "date": "${input.date}" },
                          "direction":     "out",
                          "unresolvedLinkAction": "NOTHING"
                        }

            },
            { "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
        ],

希望它有所帮助。