我正在尝试在Cypher中使用Neo4j 2.1.5正则表达式匹配并遇到问题。
我需要对用户有权访问的特定字段实施全文搜索。访问要求是关键,这是阻止我将所有内容转储到Lucene实例并以此方式查询的原因。访问系统是动态的,因此我需要查询特定用户有权访问的节点集,然后在这些节点内执行搜索。我真的想将这组节点与Lucene查询进行匹配,但我无法弄清楚如何做到这一点,所以我现在只使用基本的正则表达式匹配。我的问题是Neo4j并不总是返回预期的结果。
例如,我有大约200个节点,其中一个节点如下:
( i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
此查询产生一个结果:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.name =~ "(?i).*mosaic.*")
RETURN i
> Returned 1 row in 569 ms
但即使description属性与表达式匹配,此查询也会产生零结果:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 601 ms
此查询也会产生零结果,即使它包含之前返回结果的name属性:
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 487 ms
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
RETURN searchText
>
...
SotoLinear Glass Mosaic Tiles Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
...
更奇怪的是,如果我搜索不同的术语,它会毫无问题地返回所有预期的结果。
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*plumbing.*")
RETURN i
> Returned 8 rows in 522 ms
然后我尝试在节点上缓存搜索文本,并添加了一个索引以查看是否会改变任何内容,但它仍然没有产生任何结果。
CREATE INDEX ON :node(searchText)
MATCH (p)-->(:group)-->(i:node)
WHERE (i.searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 3182 ms
然后我尝试简化数据以重现问题,但在这个简单的情况下,它按预期工作:
MERGE (i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
WITH i, (
i.name + " " + COALESCE(i.description, "")
) AS searchText
WHERE searchText =~ "(?i).*mosaic.*"
RETURN i
> Returned 1 rows in 630 ms
我也试过使用CYPHER 2.1.EXPERIMENTAL标签,但没有改变任何结果。我对正则表达式支持的工作方式做出了错误的假设吗?还有什么我应该尝试或其他方式来调试问题吗?
其他信息
以下是我在创建节点时对Cypher Transactional Rest API进行的示例调用。这是在向数据库添加节点时发送的实际纯文本(除了一些格式以便于阅读)。任何字符串编码都只是Go在创建新的HTTP请求时执行的标准URL编码。
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MATCH (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 })"
}
]}
如果是编码问题,为什么name
上的搜索无效,description
无效,name
+ description
无效?有没有办法检查数据库,看看数据是否/如何编码。当我执行搜索时,返回的文本显示正确。
答案 0 :(得分:3)
只是几点说明:
我刚刚尝试了你的确切json声明,完美无缺。
插入
curl -H accept:application/json -H content-type:application/json -d @insert.json \
-XPOST http://localhost:7474/db/data/transaction/commit
JSON:
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MERGE (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 }) RETURN m"
}
]}
查询:
MATCH (p)-->(:group)-->(i:material)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
返回:
name: Linear Glass Mosaic Tiles
id: lsF3BxzFdn0kj
description: Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
object: material
您可以尝试检查数据的方法是查看浏览器提供的json或csv转储(结果和表结果上的小下载图标)
或者您使用neo4j-shell和我的shell-import-tools实际输出csv或graphml并检查这些文件。
或使用一些java(或groovy)代码来检查您的数据。
还有neo4j-enterprise下载附带的一致性检查程序。这是关于如何运行它的blog post。
java -cp 'lib/*:system/lib/*' org.neo4j.consistency.ConsistencyCheckTool /tmp/foo
我在这里添加了一个groovy测试脚本:https://gist.github.com/jexp/5a183c3501869ee63d30
有时会出现多行的事情,还有两个标志:
multiline (?m)
也匹配多行和dotall (?s)
允许点也匹配特殊字符,如换行符你可以尝试(?ism).*mosaic.*