Neo4j性能挑战 - 如何改进?

时间:2015-03-27 15:03:29

标签: neo4j cypher

过去几周我一直在与Neo4J争吵,试图解决一些极具挑战性的性能问题。此时,我需要一些额外的帮助,因为我无法确定如何继续前进。

我的图表总共有大约1250万个节点和6400万个关系。该图的目的是分析可疑的财务行为,因此它是客户,帐户,交易等。

以下是性能挑战的一个示例:

  • 此查询总节点需要96,064毫秒才能完成,这非常长。

    neo4j-sh (?)$ MATCH (n) RETURN count(n);
    +----------+
    | count(n) |
    +----------+
    | 12519940 |
    +----------+
    1 row
    96064 ms
    
  • 查询总关系需要919,449毫秒才能完成,这看起来很愚蠢。

    neo4j-sh (?)$ MATCH ()-[r]-() return count(r);
    +----------+
    | count(r) |
    +----------+
    | 64062508 |
    +----------+
    1 row
    919449 ms
    
  • 我有6.6M交易节点。当我尝试搜索金额超过8,000美元的交易时,查询也需要653,637毫秒。

    neo4j-sh (?)$ MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);        
    +----------+
    | count(t) |
    +----------+
    | 10696    |
    +----------+
    1 row
    653637 ms 
    

相关架构

 ON :Transaction(baseamount)    ONLINE                             
 ON :Transaction(type)          ONLINE                             
 ON :Transaction(amount)        ONLINE                             
 ON :Transaction(currency)      ONLINE                             
 ON :Transaction(basecurrency)  ONLINE                             
 ON :Transaction(transactionid) ONLINE (for uniqueness constraint)

查询个人资料:

neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);  
+----------+
| count(t) |
+----------+
| 10696    |
+----------+
1 row

ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +NodeByLabel

+------------------+---------+----------+-------------+------------------------------------------+
|         Operator |    Rows |   DbHits | Identifiers |                                    Other |
+------------------+---------+----------+-------------+------------------------------------------+
|     ColumnFilter |       1 |        0 |             |                    keep columns count(t) |
| EagerAggregation |       1 |        0 |             |                                          |
|           Filter |   10696 | 13216382 |             | Property(t,amount(62)) > {  AUTODOUBLE0} |
|      NodeByLabel | 6608191 |  6608192 |        t, t |                             :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+
  • 我在neo4j shell中运行它们。

  • 这里的性能挑战开始对我是否能够使用Neo4J产生实质性怀疑,并且看起来与平台提供的潜力相反。

  • 我完全承认我可能错误配置了一些东西(我对Neo4J来说相对较新),因此非常感谢有关修复内容或查看内容的指导。

以下是我的设置细节:

系统:Linux,Ubuntu,16GB内存,3.5 i5 Proc,256GB SSD HD

CPU

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 4230.625
cache size  : 6144 KB

内存

$ cat /proc/meminfo
MemTotal:       16115020 kB
MemFree:          224856 kB
MemAvailable:    8807160 kB
Buffers:          124356 kB
Cached:          8429964 kB
SwapCached:         8388 kB

磁盘

$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/data1--vg-root  219G   32G  177G  16% /

Neo4J.properties

neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true

Neo4J-Wrapper.properties

wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties

#********************************************************************
# JVM Parameters
#********************************************************************

wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow

# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144

其他

  • 将Linux的打开文件设置更改为40k

  • 我没有在这台机器上运行任何其他东西,没有X Windows,没有其他数据库服务器。以下是运行查询时的顶部片段:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                
    15785 neo4j     20   0 12.192g 8.964g 2.475g S 100.2 58.3 227:50.98 java                                                                                                                   
    1 root      20   0   33464   2132   1140 S   0.0  0.0   0:02.36 init                                                                                                                   
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd
    
  • graph.db目录中的文件总大小为:

    data/graph.db$ du --max-depth=1 -h
    1.9G    ./schema
    36K ./index
    26G .
    
  • 数据加载受到极大影响。一些合并将花费不到60秒(即使是~200到300K插入),而一些合并将持续超过3小时(对于一个日期合并189,999行的CSV文件为11,898,514ms)

  • 我得到了常量GC线程阻塞:

    2015-03-27 14:56:26.347+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 15422ms.
    2015-03-27 14:56:39.011+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12363ms.
    2015-03-27 14:56:57.533+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 13969ms.
    2015-03-27 14:57:17.345+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 14657ms.
    2015-03-27 14:57:29.955+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 12309ms.
    2015-03-27 14:58:14.311+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC Monitor: Application threads blocked for 1928ms.
    

请告诉我是否应该添加其他对讨论更为重要的内容


更新1

非常感谢你的帮助,我感动,所以我迟迟没有回复。

  1. Neostore文件的大小:

    /data/graph.db$ ls -lah neostore.*
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.id
    -rw-rw-r-- 1 neo4j neo4j  110 Apr  2 13:03 neostore.labeltokenstore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.labeltokenstore.db.id
    -rw-rw-r-- 1 neo4j neo4j  874 Apr  2 13:03 neostore.labeltokenstore.db.names
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.labeltokenstore.db.names.id
    -rw-rw-r-- 1 neo4j neo4j 200M Apr  2 13:03 neostore.nodestore.db
    -rw-rw-r-- 1 neo4j neo4j   41 Apr  2 13:03 neostore.nodestore.db.id
    -rw-rw-r-- 1 neo4j neo4j   68 Apr  2 13:03 neostore.nodestore.db.labels
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.nodestore.db.labels.id
    -rw-rw-r-- 1 neo4j neo4j 2.8G Apr  2 13:03 neostore.propertystore.db
    -rw-rw-r-- 1 neo4j neo4j  128 Apr  2 13:03 neostore.propertystore.db.arrays
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.arrays.id
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.id
    -rw-rw-r-- 1 neo4j neo4j  720 Apr  2 13:03 neostore.propertystore.db.index
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.index.id
    -rw-rw-r-- 1 neo4j neo4j 3.1K Apr  2 13:03 neostore.propertystore.db.index.keys
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.index.keys.id
    -rw-rw-r-- 1 neo4j neo4j 1.7K Apr  2 13:03 neostore.propertystore.db.strings
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.strings.id
    -rw-rw-r-- 1 neo4j neo4j  47M Apr  2 13:03 neostore.relationshipgroupstore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshipgroupstore.db.id
    -rw-rw-r-- 1 neo4j neo4j 1.1G Apr  2 13:03 neostore.relationshipstore.db
    -rw-rw-r-- 1 neo4j neo4j 1.6M Apr  2 13:03 neostore.relationshipstore.db.id
    -rw-rw-r-- 1 neo4j neo4j  165 Apr  2 13:03 neostore.relationshiptypestore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshiptypestore.db.id
    -rw-rw-r-- 1 neo4j neo4j 1.3K Apr  2 13:03 neostore.relationshiptypestore.db.names
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshiptypestore.db.names.id
    -rw-rw-r-- 1 neo4j neo4j 3.5K Apr  2 13:03 neostore.schemastore.db
    -rw-rw-r-- 1 neo4j neo4j   25 Apr  2 13:03 neostore.schemastore.db.id
    
  2. 我读到映射的内存设置被另一个缓存替换,我已经注释掉了这些设置。

  3. Java Profiler

       JvmTop 0.8.0 alpha - 16:12:59,  amd64,  4 cpus, Linux 3.16.0-33, load avg 0.30
       http://code.google.com/p/jvmtop
    
       Profiling PID 4260:            org.neo4j.server.Bootstrapper 
    
        68.67% (    14.01s) org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
        18.73% (     3.82s) org.neo4j.kernel.impl.nioneo.store.StoreFailureException.<init>()
         2.86% (     0.58s) org.neo4j.kernel.impl.cache.ReferenceCache.put()
         1.11% (     0.23s) org.neo4j.helpers.Counter.inc()
         0.87% (     0.18s) org.neo4j.kernel.impl.cache.ReferenceCache.get()
         0.65% (     0.13s) org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
         0.63% (     0.13s) org.parboiled.scala.package$.getCurrentRuleMethod()
         0.62% (     0.13s) scala.collection.mutable.OpenHashMap.<init>()
         0.62% (     0.13s) scala.collection.mutable.AbstractSeq.<init>()
         0.62% (     0.13s) org.neo4j.kernel.impl.cache.AutoLoadingCache.get()
         0.61% (     0.13s) scala.collection.TraversableLike$$anonfun$map$1.apply()
         0.61% (     0.12s) org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
         0.61% (     0.12s) org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.<init>()
         0.61% (     0.12s) scala.collection.AbstractTraversable.<init>()
         0.61% (     0.12s) scala.collection.immutable.List.toStream()
         0.60% (     0.12s) org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
         0.57% (     0.12s) org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
         0.37% (     0.08s) org.parboiled.scala.Parser$class.rule()
         0.06% (     0.01s) scala.util.DynamicVariable.value()
    

1 个答案:

答案 0 :(得分:2)

不幸的是,架构索引(也就是那些使用CREATE INDEX ON :Label(property)创建的索引)还不支持大于/小于条件。因此,Neo4j回退到使用给定标签扫描所有节点并过滤其属性。这当然很贵。

我确实看到了两种不同的解决方法:

1)如果您的条件始终具有预定义的最大粒度,例如10美元,您可以建立一个&#34;金额树&#34;类似于时间树(见http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html)。

2)如果您事先不知道粒度,则另一个选项是为amount属性设置手动或自动索引,请参阅http://neo4j.com/docs/stable/indexing.html。最容易的事情可能是使用自动索引。在neo4j.properties中设置以下选项:

node_auto_indexing=true
node_keys_indexable=amount

请注意,这不会自动将所有现有事务添加到该索引中,它只是将那些已经写入索引的那些放入自启用自动索引之后。

您可以使用

对自动索引进行显式范围查询
MATCH t=node:node_auto_index("amount:[6000 TO 999999999]")
RETURN count(t)