Question

我将MySQL数据库迁移到Neo4j并测试了一个简单的请求。我很惊讶地发现neo4j中的等效请求比MySql长10到100倍。我正在研究Neo4j 2.0.1。

在原始 MySql 架构中，我有以下三个表：

国家/地区：包含＆＃39;代码＆＃39;，＆＃39; continent_id＆＃39;并且选择了一个＆＃39;布尔型，
城市：包含＆＃39; country_code＆＃39;，＆＃39; name＆＃39;和“＃”状态＆＃39;布尔型，
剧院：包含＆＃39; city_id＆＃39;和一个公共＆＃39;布尔型，

每个属性都有索引。我希望按照几个条件显示某个特定大陆的城市剧院数量。请求是：

SELECT count(*) as nb, c.name 
FROM `cities` c LEFT JOIN theaters t ON c.id = t.city_id 
WHERE c.country_code IN 
  (SELECT code FROM countries WHERE selected is true AND continent_id = 4)
 AND c.status=1 AND t.public = 1 
GROUP BY c.name  ORDER BY nb DESC

Neo4j 中的数据库架构如下：

（：大陆） - [：包括] - GT;（：国家{选自： BOOL }） - [：包括] - GT;（：城市{名：串< / em>，状态： bool }） - [：包括] - ＆gt;（：Theatre {public： bool }）

还有一个在每个属性上定义的索引。 Cypher的要求是：

MATCH (:Continent{code: 4})-[:Include]->(:Country{selected:true})-[:Include]->(city:City{status:true})-[:Include]->(:Theater{public: true}) RETURN city.name, count(*) AS nb ORDER BY nb DESC

每个数据库中大约有70,000个城市和140,000个剧院。

在ID为4的大陆上， MySql 请求占用 0.02s ，而 Neo4j 占用 0.4s 。此外，如果我在Cypher请求中引入Country和City（...(:Country{selected:true})-[:Include*..3]->(city:City{status:true})...）之间的可变关系长度，因为我希望能够添加像Regions这样的中间级别，那么请求需要超过2秒。

我知道在这种特殊情况下使用Neo4j代替MySql没有任何好处，但我希望看到两种技术之间的性能大致相当，我想利用Neo4j的地理层次结构功能

我错过了什么或者这是Neo4j的限制吗？

感谢您的回答。

编辑：首先，您将找到数据库转储文件here。 Neo4j server configuration是开箱即用的。我在Ruby环境中工作，我使用neography gem。我也分开运行Neo4J服务器因为我不在JRuby上，所以它通过Rest API发送cypher请求。

该数据库包含244个国家，69000个城市和138,000个影院。对于continent_id 4，有46,982个城市（37,210个状态布尔设置为true）和74,420个影院。

请求返回了2256行。在第三次运行，花了338毫秒。以下是带有分析信息的请求输出：

profile MATCH (:Continent{code: 4})-[:Include]->(country:Country{selected:true})-[:Include*..1]->(city:City{status:true})-[:Include]->(theater:Theater{public: true}) RETURN city.name, count(*) AS nb ORDER BY nb DESC; ==> ColumnFilter(symKeys=["city.name", " INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2"], returnItemNames=["city.name", "nb"], _rows=2256, _db_hits=0) ==> Sort(descr=["SortItem(Cached( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2 of type Integer),false)"], _rows=2256, _db_hits=0) ==> EagerAggregation(keys=["Cached(city.name of type Any)"], aggregates=["( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2,CountStar())"], _rows=2256, _db_hits=0) ==> Extract(symKeys=["city", " UNNAMED27", " UNNAMED7", "country", " UNNAMED113", "theater", " UNNAMED72"], exprKeys=["city.name"], _rows=2257, _db_hits=2257) ==> Filter(pred="(hasLabel(theater:Theater(3)) AND Property(theater,public(5)) == true)", _rows=2257, _db_hits=2257) ==> SimplePatternMatcher(g="(city)-[' UNNAMED113']-(theater)", _rows=2257, _db_hits=4514) ==> Filter(pred="(((hasLabel(city:City(2)) AND hasLabel(city:City(2))) AND Property(city,status(4)) == true) AND Property(city,status(4)) == true)", _rows=2257, _db_hits=74420) ==> TraversalMatcher(start={"label": "Continent", "query": "Literal(4)", "identifiers": [" UNNAMED7"], "property": "code", "producer": "SchemaIndex"}, trail="( UNNAMED7)-[ UNNAMED27:Include WHERE (((hasLabel(NodeIdentifier():Country(1)) AND hasLabel(NodeIdentifier():Country(1))) AND Property(NodeIdentifier(),selected(3)) == true) AND Property(NodeIdentifier(),selected(3)) == true) AND true]->(country)-[:Include*1..1]->(city)", _rows=37210, _db_hits=37432)

Answer 1

你是对的，我自己试了一次，只把它缩短到了100毫秒。

 MATCH (:Continent{code: 4})-[:Include]->
       (country:Country{selected:true})-[:Include]->
       (city:City{status:true})-[:Include]->
       (theater:Theater{public: true}) 
 RETURN city.name, count(*) AS nb 
 ORDER BY nb DESC;

| "Forbach"                       | 1  |
| "Stuttgart"                     | 1  |
| "Mirepoix"                      | 1  |
| "Bonnieux"                      | 1  |
| "Saint Cyprien Plage"           | 1  |
| "Crissay sur Manse"             | 1  |
+--------------------------------------+
2256 rows
**85 ms**

请注意，从2.0.x开始的cypher尚未进行性能优化，该工作在Neo4j 2.1中开始，并将持续到2.3。内核中还计划进行更多性能工作，这也将加快速度。

我也用Java实现了解决方案并将其降低到19毫秒。它当然不是那么漂亮，而是我们用cypher瞄准的地方：

class City {
    Node city;
    int count = 1;

    public City(Node city) {
        this.city = city;
    }

    public void inc() { count++; }

    @Override
    public String toString() {
        return String.format("City{city=%s, count=%d}", city.getProperty("name"), count);
    }
}

private List<?> queryJava3() {
    long start = System.currentTimeMillis();
    Node continent = IteratorUtil.single(db.findNodesByLabelAndProperty(CONTINENT, "code", 4));
    Map<Node,City> result = new HashMap<>();
    for (Relationship rel1 : continent.getRelationships(Direction.OUTGOING,Include)) {
        Node country = rel1.getEndNode();
        if (!(country.hasLabel(COUNTRY) && (Boolean) country.getProperty("selected", false))) continue;
        for (Relationship rel2 : country.getRelationships(Direction.OUTGOING, Include)) {
            Node city = rel2.getEndNode();
            if (!(city.hasLabel(CITY) && (Boolean) city.getProperty("status", false))) continue;
            for (Relationship rel3 : city.getRelationships(Direction.OUTGOING, Include)) {
                Node theater = rel3.getEndNode();
                if (!(theater.hasLabel(THEATER) && (Boolean) theater.getProperty("public", false))) continue;
                City city1 = result.get(city);
                if (city1==null) result.put(city,new City(city));
                else city1.inc();
            }
        }
    }
    List<City> list = new ArrayList<>(result.values());
    Collections.sort(list, new Comparator<City>() {
        @Override
        public int compare(City o1, City o2) {
            return Integer.compare(o2.count,o1.count);
        }
    });
    output("java", start, list.iterator());
    return list;
}


java time = 19ms
first = City{city=Val de Meuse, count=1} total-count 22561

Answer 2

你是如何衡量它的？这是第一次还是后续的？

该查询返回了多少个城市/剧院？

您是否可以使用http://localhost:7474/webadmin/#/console/为查询添加“个人资料”并发布生成的查询计划？

默认情况下可能会选择错误的索引。

另请注意，使用2.0.1 Cypher还没有最高性能。我们正在研究这个问题。因此，如果您希望获得最佳性能，则必须降低到较低级别的API。

有没有机会与我分享你的数据库，看看性能可能在哪里。

只有一个“INCLUDE”关系类型可能会使它比需要的更昂贵。

您能否发布您的neo4j配置（conf / *）以及您的graph.db / messages.log？

与使用Neo4j的MySql相比，性能不佳

2 个答案: