NEO4J Spatial:关于批量插入器的提示

时间:2014-05-01 12:29:51

标签: neo4j

这是我的场景:我们正在使用neo4j和空间插件构建路由系统。我们从OSM文件开始,我们读取此文件并在图形中导入节点和关系(自定义图形模型)

现在,如果我们不使用neo4j的批量插入器,为了导入压缩的OSM文件(压缩尺寸大约为140MB,正常尺寸大约为2GB),在专用服务器上需要大约3天具有以下特点:CentOS 6.5 64bit,四核,8GB RAM;请注意,大部分时间与Neo4J节点和关系创建有关;实际上如果我们在没有对neo4j做任何事情的情况下读取同一个文件,则会在大约7分钟内读取该文件(我确信这是因为在我们的过程中我们首先读取文件以便存储正确的osm节点ID然后我们再次读取文件以创建neo4j图表)

显然我们需要改进导入过程,所以我们尝试使用batchInserter。到目前为止,这么好(我需要通过使用batchInserter检查它将执行多少但我想它会更快);所以我做的第一件事是:让我们尝试在一个简单的测试用例中使用批量插入器(非常类似于我们的代码,但不直接修改我们的代码)

我列出了我的软件版本:

  • Neo4j:2.0.2
  • Neo4jSpatial:0.13-neo4j-2.0.1
  • Neo4jGraphCollections:0.7.1-neo4j-2.0.1
  • 渗透:0.43.1

由于我使用渗透来读取osm文件,我编写了以下Sink实现:

public class BatchInserterSinkTest implements Sink
{
 public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
 private static File basePath = new File("/home/angelo/Scrivania/neo4j");
    private static File dbPath = new File(basePath, "db");
    private GraphDatabaseService graphDb;
    private BatchInserter batchInserter;
//    private BatchInserterIndexProvider batchIndexService;
    private SpatialDatabaseService spatialDb;
    private SimplePointLayer spl;
 static
 {
 NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
        NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
        NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
        NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
        NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
        NEO4J_CFG.put( "dump_configuration", "true" );
 }
 @Override
 public void initialize(Map<String, Object> arg0)
 {
 batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
        graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
        spatialDb = new SpatialDatabaseService(graphDb);
        spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
        //batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);

 }


 @Override
 public void complete()
 {
 // TODO Auto-generated method stub


 }


 @Override
 public void release()
 {
 // TODO Auto-generated method stub


 }


 @Override
 public void process(EntityContainer ec)
 {
 Entity entity = ec.getEntity();
        if (entity instanceof Node) {

        Node osmNodo = (Node)entity;
        org.neo4j.graphdb.Node graphNode = graphDb.createNode();
        graphNode.setProperty("osmId", osmNodo.getId());
        graphNode.setProperty("latitudine", osmNodo.getLatitude());
        graphNode.setProperty("longitudine", osmNodo.getLongitude());
        spl.add(graphNode);

        } else if (entity instanceof Way) {
            //do something with the way
        } else if (entity instanceof Relation) {
            //do something with the relation
        }


 }
}

然后我写了以下测试用例:

public class BatchInserterTest
{
 private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());


 @Test
 public void batchInserter()
 {
 File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
 try
 {
 boolean pbf = false;
 CompressionMethod compression = CompressionMethod.None;


 if (file.getName().endsWith(".pbf"))
 {
 pbf = true;
 }
 else if (file.getName().endsWith(".gz"))
 {
 compression = CompressionMethod.GZip;
 }
 else if (file.getName().endsWith(".bz2"))
 {
 compression = CompressionMethod.BZip2;
 }


 RunnableSource reader;


 if (pbf)
 {
 reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
 }
 else
 {
 reader = new XmlReader(file, false, compression);
 }


 reader.setSink(new BatchInserterSinkTest());


 Thread readerThread = new Thread(reader);
 readerThread.start();


 while (readerThread.isAlive())
 {
 try
 {
 readerThread.join();
 }
 catch (InterruptedException e)
 {
 /* do nothing */
 }
 }
 }
 catch (Exception e)
 {
 logger.error("Errore nella creazione di neo4j con batchInserter", e);
 }
 }
}

通过执行此代码,我得到了以下异常:

Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
 at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
 at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
 at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
 at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
 at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
 at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
 at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
 at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
 at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
 at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
 at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
 at java.lang.Thread.run(Thread.java:744)

这与此代码有关:

spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");

所以现在我想知道:我如何在我的情况下使用batchInserter?我必须将创建的节点添加到SimplePointLayer ....所以我如何使用batchInserter图形数据库服务创建它? 有什么简单的样本吗?

任何提示真的很受欢迎

欢呼声 安吉洛

1 个答案:

答案 0 :(得分:2)

代码中的OSMImporter类有一个使用批处理插件来导入OSM数据的示例。主要的是neo4j空间并不真正支持批量插入器,因此您需要手动执行一些操作。如果你看一下OSMImporter.OSMBatchWriter类,你会看到它是如何做的。它根本不使用SimplePointLayer,因为它不支持批量插入器。它正在创建它想要的图形结构。简单的点层非常简单,当然比我引用的代码创建的OSM模型简单得多,所以我认为你应该能够自己编写批量插入器兼容的版本,而不会有太多麻烦。

我建议您使用批量插入器创建图层和节点以创建正确的图形结构,然后切换到普通的嵌入式API并使用它来迭代节点并将它们添加到空间索引中。 / p>