与使用Neo4j的MySql相比,性能不佳

时间:2014-04-06 22:48:55

标签: neo4j

我将MySQL数据库迁移到Neo4j并测试了一个简单的请求。我很惊讶地发现neo4j中的等效请求比MySql长10到100倍。我正在研究Neo4j 2.0.1。

在原始 MySql 架构中,我有以下三个表:

  • 国家/地区:包含'代码',' continent_id'并且选择了一个'布尔型,
  • 城市:包含' country_code',' name'和“#”状态'布尔型,
  • 剧院:包含' city_id'和一个公共'布尔型,

每个属性都有索引。我希望按照几个条件显示某个特定大陆的城市剧院数量。请求是:

SELECT count(*) as nb, c.name 
FROM `cities` c LEFT JOIN theaters t ON c.id = t.city_id 
WHERE c.country_code IN 
  (SELECT code FROM countries WHERE selected is true AND continent_id = 4)
 AND c.status=1 AND t.public = 1 
GROUP BY c.name  ORDER BY nb DESC


Neo4j 中的数据库架构如下:

(:大陆) - [:包括] - GT;(:国家{选自: BOOL }) - [:包括] - GT;(:城市{名:串< / em>,状态: bool }) - [:包括] - &gt;(:Theatre {public: bool })

还有一个在每个属性上定义的索引。 Cypher的要求是:

MATCH (:Continent{code: 4})-[:Include]->(:Country{selected:true})-[:Include]->(city:City{status:true})-[:Include]->(:Theater{public: true})
RETURN city.name, count(*) AS nb ORDER BY nb DESC


每个数据库中大约有70,000个城市和140,000个剧院。

在ID为4的大陆上, MySql 请求占用 0.02s ,而 Neo4j 占用 0.4s 。此外,如果我在Cypher请求中引入Country和City(...(:Country{selected:true})-[:Include*..3]->(city:City{status:true})...)之间的可变关系长度,因为我希望能够添加像Regions这样的中间级别,那么请求需要超过2秒。

我知道在这种特殊情况下使用Neo4j代替MySql没有任何好处,但我希望看到两种技术之间的性能大致相当,我想利用Neo4j的地理层次结构功能

我错过了什么或者这是Neo4j的限制吗?

感谢您的回答。

编辑:首先,您将找到数据库转储文件here。 Neo4j server configuration是开箱即用的。我在Ruby环境中工作,我使用neography gem。我也分开运行Neo4J服务器 因为我不在JRuby上,所以它通过Rest API发送cypher请求。

该数据库包含244个国家,69000个城市和138,000个影院。对于continent_id 4,有46,982个城市(37,210个状态布尔设置为true)和74,420个影院。

请求返回了2256行。在第三次运行,花了338毫秒。以下是带有分析信息的请求输出:

profile MATCH (:Continent{code: 4})-[:Include]->(country:Country{selected:true})-[:Include*..1]->(city:City{status:true})-[:Include]->(theater:Theater{public: true}) RETURN city.name, count(*) AS nb ORDER BY nb DESC;

==> ColumnFilter(symKeys=["city.name", "  INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2"], returnItemNames=["city.name", "nb"], _rows=2256, _db_hits=0)
==> Sort(descr=["SortItem(Cached(  INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2 of type Integer),false)"], _rows=2256, _db_hits=0)
==>   EagerAggregation(keys=["Cached(city.name of type Any)"], aggregates=["(  INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2,CountStar())"], _rows=2256, _db_hits=0)
==>     Extract(symKeys=["city", "  UNNAMED27", "  UNNAMED7", "country", "  UNNAMED113", "theater", "  UNNAMED72"], exprKeys=["city.name"], _rows=2257, _db_hits=2257)
==>       Filter(pred="(hasLabel(theater:Theater(3)) AND Property(theater,public(5)) == true)", _rows=2257, _db_hits=2257)
==>         SimplePatternMatcher(g="(city)-['  UNNAMED113']-(theater)", _rows=2257, _db_hits=4514)
==>           Filter(pred="(((hasLabel(city:City(2)) AND hasLabel(city:City(2))) AND Property(city,status(4)) == true) AND Property(city,status(4)) == true)", _rows=2257, _db_hits=74420)
==>             TraversalMatcher(start={"label": "Continent", "query": "Literal(4)", "identifiers": ["  UNNAMED7"], "property": "code", "producer": "SchemaIndex"}, trail="(  UNNAMED7)-[  UNNAMED27:Include WHERE (((hasLabel(NodeIdentifier():Country(1)) AND hasLabel(NodeIdentifier():Country(1))) AND Property(NodeIdentifier(),selected(3)) == true) AND Property(NodeIdentifier(),selected(3)) == true) AND true]->(country)-[:Include*1..1]->(city)", _rows=37210, _db_hits=37432)

2 个答案:

答案 0 :(得分:4)

你是对的,我自己试了一次,只把它缩短到了100毫秒。

 MATCH (:Continent{code: 4})-[:Include]->
       (country:Country{selected:true})-[:Include]->
       (city:City{status:true})-[:Include]->
       (theater:Theater{public: true}) 
 RETURN city.name, count(*) AS nb 
 ORDER BY nb DESC;

| "Forbach"                       | 1  |
| "Stuttgart"                     | 1  |
| "Mirepoix"                      | 1  |
| "Bonnieux"                      | 1  |
| "Saint Cyprien Plage"           | 1  |
| "Crissay sur Manse"             | 1  |
+--------------------------------------+
2256 rows
**85 ms**

请注意,从2.0.x开始的cypher尚未进行性能优化,该工作在Neo4j 2.1中开始,并将持续到2.3。内核中还计划进行更多性能工作,这也将加快速度。

我也用Java实现了解决方案并将其降低到19毫秒。它当然不是那么漂亮,而是我们用cypher瞄准的地方:

class City {
    Node city;
    int count = 1;

    public City(Node city) {
        this.city = city;
    }

    public void inc() { count++; }

    @Override
    public String toString() {
        return String.format("City{city=%s, count=%d}", city.getProperty("name"), count);
    }
}

private List<?> queryJava3() {
    long start = System.currentTimeMillis();
    Node continent = IteratorUtil.single(db.findNodesByLabelAndProperty(CONTINENT, "code", 4));
    Map<Node,City> result = new HashMap<>();
    for (Relationship rel1 : continent.getRelationships(Direction.OUTGOING,Include)) {
        Node country = rel1.getEndNode();
        if (!(country.hasLabel(COUNTRY) && (Boolean) country.getProperty("selected", false))) continue;
        for (Relationship rel2 : country.getRelationships(Direction.OUTGOING, Include)) {
            Node city = rel2.getEndNode();
            if (!(city.hasLabel(CITY) && (Boolean) city.getProperty("status", false))) continue;
            for (Relationship rel3 : city.getRelationships(Direction.OUTGOING, Include)) {
                Node theater = rel3.getEndNode();
                if (!(theater.hasLabel(THEATER) && (Boolean) theater.getProperty("public", false))) continue;
                City city1 = result.get(city);
                if (city1==null) result.put(city,new City(city));
                else city1.inc();
            }
        }
    }
    List<City> list = new ArrayList<>(result.values());
    Collections.sort(list, new Comparator<City>() {
        @Override
        public int compare(City o1, City o2) {
            return Integer.compare(o2.count,o1.count);
        }
    });
    output("java", start, list.iterator());
    return list;
}


java time = 19ms
first = City{city=Val de Meuse, count=1} total-count 22561

答案 1 :(得分:1)

你是如何衡量它的?这是第一次还是后续的?

该查询返回了多少个城市/剧院?

您是否可以使用http://localhost:7474/webadmin/#/console/为查询添加“个人资料”并发布生成的查询计划?

默认情况下可能会选择错误的索引。

另请注意,使用2.0.1 Cypher还没有最高性能。我们正在研究这个问题。因此,如果您希望获得最佳性能,则必须降低到较低级别的API。

有没有机会与我分享你的数据库,看看性能可能在哪里。

只有一个“INCLUDE”关系类型可能会使它比需要的更昂贵。

您能否发布您的neo4j配置(conf / *)以及您的graph.db / messages.log?