利用OrientDB ETL在CSV

时间:2016-08-24 09:18:53

标签: orientdb orientdb2.2

我正在使用OrientDB ETL工具导入GB中的大量数据。 CSV的格式是这样的(我正在使用orientDB 2.2):

“101.186.130.130”,“527225725”,“233 djfnsdkj”,“0.119836317542” “125.143.534.148”,“112212983”,“1227 sdfsdfds”,“0.0465215171983” “103.149.957.752”,“112364761”,“1121 sdfsdfds”,“0.0938863016658” “103.190.245.128”,“785804692”,“6138 sdfsdfsd”,“0.117767539364”

我需要创建两个顶点,其中一个值为Column1(键为值本身),另一个顶点值为第2列和第2列。 3(它的键与两个值连接,并且都作为第二个顶点类型中的属性存在,第4列将是连接这两个顶点的边的属性。

我使用下面的代码,它可以正常使用一些错误,一个问题是每个csv行中的所有值都存储为IpAddress顶点中的属性,是否有任何方法只在其中存储IpAddress。其次,请你告诉我连接从csv读取的两个值的方法。

{
  "source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
 "extractor": { "csv": {"columnsOnFirstLine": false, "columns":     ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
 "transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
            "joinFieldName": "address",
            "lookup": "PhyLocation.loc",
            "direction": "out",
    "targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
        "edgeFields": { "confidence": "${input.prob}" },
        "unresolvedLinkAction": "CREATE"
        }
    }
 ],
"loader": {
"orientdb": {
   "dbURL": "remote:/localhost/Bulk_Transfer_Test",
   "dbType": "graph",
   "dbUser": "root",
   "dbPassword": "tiger",
   "serverUser": "root",
   "serverPassword": "tiger",
   "classes": [
     {"name": "IpAddress", "extends": "V"},
     {"name": "PhyLocation", "extends": "V"},
 {"name": "Located", "extends": "E"}
   ], "indexes": [
     {"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
 {"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
   ]
}
}
}

0 个答案:

没有答案