Question

更新我提出了一个跟进问题，其中包含更新的脚本以及neo4j performance compared to mysql (how can it be improved?)上更清晰的设置。请继续。 / UPDATE

我在验证“图表数据库”一书（第20页）和neo4j（第1章）中的性能声明时遇到了一些问题。

为了验证这些声明，我创建了一个100000个“人”条目的样本数据集，每个条目有50个“朋友”，并尝试查询例如朋友4跳远。我在mysql中使用了相同的数据集。朋友的朋友超过4跳 mysql返回0.93秒，而 neo4j需要65-75秒（重复通话）。

如何改善这种悲惨的结果，并验证书中的声明？

更详细一点：

我使用16GB Ram在i5-3570K上运行整个设置，使用ubuntu12.04 64bit，java版本“1.7.0_25”和mysql 5.5.31，neo4j-community-2.0.0-M03（我得到一个类似的结果与1.9）

所有代码/样本数据都可以在https://github.com/jhb/neo4j-experiements/上找到（与2.0.0一起使用）。可以在https://github.com/jhb/neo4j-testdata上找到不同格式的结果样本数据。

要使用这些脚本，您需要安装mysql-python，requests和simplejson的python。

使用friendsdata.py创建数据集并存储到friends.pickle
friends.pickle使用import_friends_neo4j.py
friends.pickle使用import_friends_mysql.py
我在mysql
我在neo4j

为了让朋友的生活更轻松。*。bz2包含sql和cypher语句，用于在mysql和neo4j 2.0 M3中创建这些数据集。

Mysql性能

我首先通过查询来激活mysql：

select count(distinct name) from t_user;
select count(distinct name) from t_user;

然后，为了真正的me me，我做

python query_friends_mysql.py 4 10

这将创建以下sql语句（更改t_user.names）：

select 
    count(*)
from
    t_user,
    t_user_friend as uf1, 
    t_user_friend as uf2, 
    t_user_friend as uf3, 
    t_user_friend as uf4
where
    t_user.name='person8601' and 
    t_user.id = uf1.user_1 and
    uf1.user_2 = uf2.user_1 and
    uf2.user_2 = uf3.user_1 and
    uf3.user_2 = uf4.user_1;

并重复此4跳查询10次。查询每个需要大约0.95秒。 Mysql配置为使用4G的key_buffer。

neo4j性能测试

我修改了neo4j.properties：

neostore.nodestore.db.mapped_memory=25M
neostore.relationshipstore.db.mapped_memory=250M

和neo4j-wrapper.conf：

wrapper.java.initmemory=2048
wrapper.java.maxmemory=8192

为了热身neo4j，我做了

start n=node(*) return count(n.noscenda_name);
start r=relationship(*) return count(r);

然后我开始使用事务性http端点（但我使用neo4j-shell得到了相同的结果）。

还在热身，我跑了

./bin/python query_friends_neo4j.py 3 10

这会创建一个表单查询（具有不同的人ID）：

{"statement": "match n:node-[r*3..3]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}

在第7次通话后，每次通话需要大约0.7-0.8秒。

现在我做了真正的（4跳）

./bin/python query_friends_neo4j.py 4 10

创建

{"statement": "match n:node-[r*4..4]->m:node where n.noscenda_name={target} return count(r);", "parameters": {"target": "person3089"}

每次通话需要65到75秒。

打开问题/想法

我非常希望看到书中的声明可以重复和正确，neo4j比mysql更快，而不是更慢。

但我不知道我做错了什么......： - （

所以，我最大的希望是：

我没有正确执行neo4j的内存设置
我用于neo4j的查询是完全错误的

非常欢迎任何让neo4j加速的建议。

非常感谢，

约尔格

Answer 1

2.0根本没有进行性能优化，因此您应该使用1.9.2进行比较。（如果您使用2.0 - 您是否为n.noscenda_name创建索引）

您可以使用profile start ...检查查询计划。

使用1.9时，请使用手动索引，或node_auto_index使用noscenda_name。

您可以尝试这些查询：

start n=node:node_auto_index(noscenda_name={target})
match n-->()-->()-->m
return count(*);

全文索引也比精确索引更昂贵，因此保留exact的{{1}}自动索引。

无法让你的导入器运行，它在某些时候失败，也许你可以共享完成的neo4j数据库

noscenda_name

Answer 2

只是为了补充迈克尔所说的，在书中我认为作者指的是在Neo4j in Action书中所做的比较 - 它在free first chapter of that book中有描述。

在第7页的顶部，他们解释说他们使用的是Traversal API而不是Cypher。

我认为您现在很难让Cypher接近该级别的性能，所以如果您想要执行这些类型的查询，您可能希望直接使用Traversal API，然后将其包装在{{3 }}

无法在行动书籍中重现/验证图形数据库和neo4j中的性能声明

更详细一点：

Mysql性能

neo4j性能测试

打开问题/想法

2 个答案: