这是我的场景:我们正在使用neo4j和空间插件构建路由系统。我们从OSM文件开始,我们读取此文件并在图形中导入节点和关系(自定义图形模型)
现在,如果我们不使用neo4j的批量插入器,为了导入压缩的OSM文件(压缩尺寸大约为140MB,正常尺寸大约为2GB),在专用服务器上需要大约3天具有以下特点:CentOS 6.5 64bit,四核,8GB RAM;请注意,大部分时间与Neo4J节点和关系创建有关;实际上如果我们在没有对neo4j做任何事情的情况下读取同一个文件,则会在大约7分钟内读取该文件(我确信这是因为在我们的过程中我们首先读取文件以便存储正确的osm节点ID然后我们再次读取文件以创建neo4j图表)
显然我们需要改进导入过程,所以我们尝试使用batchInserter。到目前为止,这么好(我需要通过使用batchInserter检查它将执行多少但我想它会更快);所以我做的第一件事是:让我们尝试在一个简单的测试用例中使用批量插入器(非常类似于我们的代码,但不直接修改我们的代码)
我列出了我的软件版本:
由于我使用渗透来读取osm文件,我编写了以下Sink实现:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
@Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
@Override
public void complete()
{
// TODO Auto-generated method stub
}
@Override
public void release()
{
// TODO Auto-generated method stub
}
@Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
然后我写了以下测试用例:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
@Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
通过执行此代码,我得到了以下异常:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
这与此代码有关:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
所以现在我想知道:我如何在我的情况下使用batchInserter?我必须将创建的节点添加到SimplePointLayer ....所以我如何使用batchInserter图形数据库服务创建它? 有什么简单的样本吗?
任何提示真的很受欢迎
欢呼声 安吉洛
答案 0 :(得分:2)
代码中的OSMImporter类有一个使用批处理插件来导入OSM数据的示例。主要的是neo4j空间并不真正支持批量插入器,因此您需要手动执行一些操作。如果你看一下OSMImporter.OSMBatchWriter类,你会看到它是如何做的。它根本不使用SimplePointLayer,因为它不支持批量插入器。它正在创建它想要的图形结构。简单的点层非常简单,当然比我引用的代码创建的OSM模型简单得多,所以我认为你应该能够自己编写批量插入器兼容的版本,而不会有太多麻烦。
我建议您使用批量插入器创建图层和节点以创建正确的图形结构,然后切换到普通的嵌入式API并使用它来迭代节点并将它们添加到空间索引中。 / p>