我有两个CSV文件:
首先包含以下格式的~500M记录
ID,名称
10000023432,Tom用户
13943423235,Blah Person
第二个包含以下格式的〜1.5B朋友关系
fromId,TOID
10000023432,13943423235
我使用OrientDB ETL工具从第一个CSV文件创建顶点。现在,我只需要创建边缘以建立它们之间的友谊连接。
到目前为止,我已经尝试过多次配置ETL json文件,最新的是这个:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}
但不幸的是,这也创造了&#34;来自&#34;和&#34;到&#34;财产,除了创造边缘。
当我尝试删除顶点变换器时,ETL过程会抛出错误:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more
我在这里缺少什么?
答案 0 :(得分:6)
您可以使用这些ETL变换器导入边缘:
"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]
“merge”转换器会将当前的csv行与相关的Person记录连接起来(这有点奇怪,但由于某种原因,需要将fromId与源人员联系起来。)
“field”转换器将删除合并部分添加的csv字段。您可以尝试导入而不使用“field”变换器来查看差异。
答案 1 :(得分:1)
使用Java API,您可以读取csv然后创建边
String nomeYourDb = "nomeYourDb";
OServerAdmin serverAdmin;
try {
serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
if (serverAdmin.existsDatabase()) {
OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
String csvFile = "path_to_file";
BufferedReader br = null;
String line = "";
String cvsSplitBy = " "; // your separator
try {
br = new BufferedReader(new FileReader(csvFile));
int index=0;
while ((line = br.readLine()) != null) {
if(index==0){
index=1;
}
else{
String[] ids = line.split(cvsSplitBy);
String personFrom="(select from Person where id='"+ids[0]+"')";
String personTo="(select from Person where id='"+ids[1]+"')";
String query="create edge FriendsWith from "+personFrom+" to "+personTo;
g.command(new OCommandSQL(query)).execute();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
if (br != null) {
br.close();
}
}
}
} catch (IOException e) {
e.printStackTrace();
}