我正在使用OrientDB ETL工具导入GB中的大量数据。 CSV的格式是这样的(我正在使用orientDB 2.2):
“101.186.130.130”,“527225725”,“233 djfnsdkj”,“0.119836317542” “125.143.534.148”,“112212983”,“1227 sdfsdfds”,“0.0465215171983” “103.149.957.752”,“112364761”,“1121 sdfsdfds”,“0.0938863016658” “103.190.245.128”,“785804692”,“6138 sdfsdfsd”,“0.117767539364”
我需要创建两个顶点,其中一个值为Column1(键为值本身),另一个顶点值为第2列和第2列。 3(它的键与两个值连接,并且都作为第二个顶点类型中的属性存在,第4列将是连接这两个顶点的边的属性。
我使用下面的代码,它可以正常使用一些错误,一个问题是每个csv行中的所有值都存储为IpAddress顶点中的属性,是否有任何方法只在其中存储IpAddress。其次,请你告诉我连接从csv读取的两个值的方法。
{
"source": { "file": { "path": "/home/abcd/OrientDB/examples/ip_address.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": false, "columns": ["ip:string", "dpcb:string", "address:string", "prob:string"] } },
"transformers": [
{ "merge": { "joinFieldName":"ip", "lookup":"IpAddress.ip" } },
{ "edge": { "class": "Located",
"joinFieldName": "address",
"lookup": "PhyLocation.loc",
"direction": "out",
"targetVertexFields": { "geo_address": "${input.address}", "dpcb_number": "${input.dpcb}"},
"edgeFields": { "confidence": "${input.prob}" },
"unresolvedLinkAction": "CREATE"
}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:/localhost/Bulk_Transfer_Test",
"dbType": "graph",
"dbUser": "root",
"dbPassword": "tiger",
"serverUser": "root",
"serverPassword": "tiger",
"classes": [
{"name": "IpAddress", "extends": "V"},
{"name": "PhyLocation", "extends": "V"},
{"name": "Located", "extends": "E"}
], "indexes": [
{"class":"IpAddress", "fields":["ip:string"], "type":"UNIQUE" },
{"class":"PhyLocation", "fields":["loc:string"], "type":"UNIQUE" }
]
}
}
}