我将MySQL数据库迁移到Neo4j并测试了一个简单的请求。我很惊讶地发现neo4j中的等效请求比MySql长10到100倍。我正在研究Neo4j 2.0.1。
在原始 MySql 架构中,我有以下三个表:
每个属性都有索引。我希望按照几个条件显示某个特定大陆的城市剧院数量。请求是:
SELECT count(*) as nb, c.name
FROM `cities` c LEFT JOIN theaters t ON c.id = t.city_id
WHERE c.country_code IN
(SELECT code FROM countries WHERE selected is true AND continent_id = 4)
AND c.status=1 AND t.public = 1
GROUP BY c.name ORDER BY nb DESC
Neo4j 中的数据库架构如下:
(:大陆) - [:包括] - GT;(:国家{选自: BOOL }) - [:包括] - GT;(:城市{名:串< / em>,状态: bool }) - [:包括] - &gt;(:Theatre {public: bool })
还有一个在每个属性上定义的索引。 Cypher的要求是:
MATCH (:Continent{code: 4})-[:Include]->(:Country{selected:true})-[:Include]->(city:City{status:true})-[:Include]->(:Theater{public: true})
RETURN city.name, count(*) AS nb ORDER BY nb DESC
每个数据库中大约有70,000个城市和140,000个剧院。
在ID为4的大陆上, MySql 请求占用 0.02s ,而 Neo4j 占用 0.4s 。此外,如果我在Cypher请求中引入Country和City(...(:Country{selected:true})-[:Include*..3]->(city:City{status:true})...
)之间的可变关系长度,因为我希望能够添加像Regions这样的中间级别,那么请求需要超过2秒。
我知道在这种特殊情况下使用Neo4j代替MySql没有任何好处,但我希望看到两种技术之间的性能大致相当,我想利用Neo4j的地理层次结构功能
我错过了什么或者这是Neo4j的限制吗?
感谢您的回答。
编辑:首先,您将找到数据库转储文件here。 Neo4j server configuration是开箱即用的。我在Ruby环境中工作,我使用neography gem。我也分开运行Neo4J服务器 因为我不在JRuby上,所以它通过Rest API发送cypher请求。
该数据库包含244个国家,69000个城市和138,000个影院。对于continent_id 4,有46,982个城市(37,210个状态布尔设置为true)和74,420个影院。
请求返回了2256行。在第三次运行,花了338毫秒。以下是带有分析信息的请求输出:
profile MATCH (:Continent{code: 4})-[:Include]->(country:Country{selected:true})-[:Include*..1]->(city:City{status:true})-[:Include]->(theater:Theater{public: true}) RETURN city.name, count(*) AS nb ORDER BY nb DESC;
==> ColumnFilter(symKeys=["city.name", " INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2"], returnItemNames=["city.name", "nb"], _rows=2256, _db_hits=0)
==> Sort(descr=["SortItem(Cached( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2 of type Integer),false)"], _rows=2256, _db_hits=0)
==> EagerAggregation(keys=["Cached(city.name of type Any)"], aggregates=["( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2,CountStar())"], _rows=2256, _db_hits=0)
==> Extract(symKeys=["city", " UNNAMED27", " UNNAMED7", "country", " UNNAMED113", "theater", " UNNAMED72"], exprKeys=["city.name"], _rows=2257, _db_hits=2257)
==> Filter(pred="(hasLabel(theater:Theater(3)) AND Property(theater,public(5)) == true)", _rows=2257, _db_hits=2257)
==> SimplePatternMatcher(g="(city)-[' UNNAMED113']-(theater)", _rows=2257, _db_hits=4514)
==> Filter(pred="(((hasLabel(city:City(2)) AND hasLabel(city:City(2))) AND Property(city,status(4)) == true) AND Property(city,status(4)) == true)", _rows=2257, _db_hits=74420)
==> TraversalMatcher(start={"label": "Continent", "query": "Literal(4)", "identifiers": [" UNNAMED7"], "property": "code", "producer": "SchemaIndex"}, trail="( UNNAMED7)-[ UNNAMED27:Include WHERE (((hasLabel(NodeIdentifier():Country(1)) AND hasLabel(NodeIdentifier():Country(1))) AND Property(NodeIdentifier(),selected(3)) == true) AND Property(NodeIdentifier(),selected(3)) == true) AND true]->(country)-[:Include*1..1]->(city)", _rows=37210, _db_hits=37432)
答案 0 :(得分:4)
你是对的,我自己试了一次,只把它缩短到了100毫秒。
MATCH (:Continent{code: 4})-[:Include]->
(country:Country{selected:true})-[:Include]->
(city:City{status:true})-[:Include]->
(theater:Theater{public: true})
RETURN city.name, count(*) AS nb
ORDER BY nb DESC;
| "Forbach" | 1 |
| "Stuttgart" | 1 |
| "Mirepoix" | 1 |
| "Bonnieux" | 1 |
| "Saint Cyprien Plage" | 1 |
| "Crissay sur Manse" | 1 |
+--------------------------------------+
2256 rows
**85 ms**
请注意,从2.0.x开始的cypher尚未进行性能优化,该工作在Neo4j 2.1中开始,并将持续到2.3。内核中还计划进行更多性能工作,这也将加快速度。
我也用Java实现了解决方案并将其降低到19毫秒。它当然不是那么漂亮,而是我们用cypher瞄准的地方:
class City {
Node city;
int count = 1;
public City(Node city) {
this.city = city;
}
public void inc() { count++; }
@Override
public String toString() {
return String.format("City{city=%s, count=%d}", city.getProperty("name"), count);
}
}
private List<?> queryJava3() {
long start = System.currentTimeMillis();
Node continent = IteratorUtil.single(db.findNodesByLabelAndProperty(CONTINENT, "code", 4));
Map<Node,City> result = new HashMap<>();
for (Relationship rel1 : continent.getRelationships(Direction.OUTGOING,Include)) {
Node country = rel1.getEndNode();
if (!(country.hasLabel(COUNTRY) && (Boolean) country.getProperty("selected", false))) continue;
for (Relationship rel2 : country.getRelationships(Direction.OUTGOING, Include)) {
Node city = rel2.getEndNode();
if (!(city.hasLabel(CITY) && (Boolean) city.getProperty("status", false))) continue;
for (Relationship rel3 : city.getRelationships(Direction.OUTGOING, Include)) {
Node theater = rel3.getEndNode();
if (!(theater.hasLabel(THEATER) && (Boolean) theater.getProperty("public", false))) continue;
City city1 = result.get(city);
if (city1==null) result.put(city,new City(city));
else city1.inc();
}
}
}
List<City> list = new ArrayList<>(result.values());
Collections.sort(list, new Comparator<City>() {
@Override
public int compare(City o1, City o2) {
return Integer.compare(o2.count,o1.count);
}
});
output("java", start, list.iterator());
return list;
}
java time = 19ms
first = City{city=Val de Meuse, count=1} total-count 22561
答案 1 :(得分:1)
你是如何衡量它的?这是第一次还是后续的?
该查询返回了多少个城市/剧院?
您是否可以使用http://localhost:7474/webadmin/#/console/
为查询添加“个人资料”并发布生成的查询计划?
默认情况下可能会选择错误的索引。
另请注意,使用2.0.1 Cypher还没有最高性能。我们正在研究这个问题。因此,如果您希望获得最佳性能,则必须降低到较低级别的API。
有没有机会与我分享你的数据库,看看性能可能在哪里。
只有一个“INCLUDE”关系类型可能会使它比需要的更昂贵。
您能否发布您的neo4j配置(conf / *)以及您的graph.db / messages.log?