Neo4j正则表达式字符串匹配未返回预期结果

时间:2014-10-26 09:02:05

标签: regex neo4j cypher

我正在尝试在Cypher中使用Neo4j 2.1.5正则表达式匹配并遇到问题。

我需要对用户有权访问的特定字段实施全文搜索。访问要求是关键,这是阻止我将所有内容转储到Lucene实例并以此方式查询的原因。访问系统是动态的,因此我需要查询特定用户有权访问的节点集,然后在这些节点内执行搜索。我真的想将这组节点与Lucene查询进行匹配,但我无法弄清楚如何做到这一点,所以我现在只使用基本的正则表达式匹配。我的问题是Neo4j并不总是返回预期的结果。

例如,我有大约200个节点,其中一个节点如下:

( i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})

此查询产生一个结果:

MATCH (p)-->(:group)-->(i:node)
  WHERE (i.name =~ "(?i).*mosaic.*")
  RETURN i

> Returned 1 row in 569 ms

但即使description属性与表达式匹配,此查询也会产生零结果:

MATCH (p)-->(:group)-->(i:node)
  WHERE (i.description=~ "(?i).*mosaic.*")
  RETURN i

> Returned 0 rows in 601 ms

此查询也会产生零结果,即使它包含之前返回结果的name属性:

MATCH (p)-->(:group)-->(i:node)
  WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
  WHERE (searchText =~ "(?i).*mosaic.*")
  RETURN i

> Returned 0 rows in 487 ms

MATCH (p)-->(:group)-->(i:node)
  WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
  RETURN searchText

>
...
SotoLinear Glass Mosaic Tiles Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
...

mosaic

更奇怪的是,如果我搜索不同的术语,它会毫无问题地返回所有预期的结果。

MATCH (p)-->(:group)-->(i:node)
  WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
  WHERE (searchText =~ "(?i).*plumbing.*")
  RETURN i

> Returned 8 rows in 522 ms

然后我尝试在节点上缓存搜索文本,并添加了一个索引以查看是否会改变任何内容,但它仍然没有产生任何结果。

CREATE INDEX ON :node(searchText)

MATCH (p)-->(:group)-->(i:node)
  WHERE (i.searchText =~ "(?i).*mosaic.*")
  RETURN i

> Returned 0 rows in 3182 ms

然后我尝试简化数据以重现问题,但在这个简单的情况下,它按预期工作:

MERGE (i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})

WITH i, (
  i.name + " " + COALESCE(i.description, "")
) AS searchText

WHERE searchText =~ "(?i).*mosaic.*"
RETURN i

> Returned 1 rows in 630 ms

我也试过使用CYPHER 2.1.EXPERIMENTAL标签,但没有改变任何结果。我对正则表达式支持的工作方式做出了错误的假设吗?还有什么我应该尝试或其他方式来调试问题吗?

其他信息

以下是我在创建节点时对Cypher Transactional Rest API进行的示例调用。这是在向数据库添加节点时发送的实际纯文本(除了一些格式以便于阅读)。任何字符串编码都只是Go在创建新的HTTP请求时执行的标准URL编码。

{"statements":[
    {
    "parameters":
        {
        "p01":"lsF30nP7TsyFh",
        "p02":
            {
            "description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
            "id":"lsF3BxzFdn0kj",
            "name":"Linear Glass Mosaic Tiles",
            "object":"material"
            }
        },
    "resultDataContents":["row"],
    "statement":
        "MATCH (p:project { id: { p01 } })
        WITH p

        CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material  { p02 })"
    }
]}

如果是编码问题,为什么name上的搜索无效,description无效,name + description无效?有没有办法检查数据库,看看数据是否/如何编码。当我执行搜索时,返回的文本显示正确。

1 个答案:

答案 0 :(得分:3)

只是几点说明:

  • 可能会使用merge替换create unique(有点不同)
  • 对于您的全文搜索,如果您的群组限制不足以将响应保持在几毫秒以下,我会使用lucene legacy index来提高性能

我刚刚尝试了你的确切json声明,完美无缺

插入

curl -H accept:application/json -H content-type:application/json -d @insert.json \
     -XPOST http://localhost:7474/db/data/transaction/commit

JSON:

{"statements":[
    {
    "parameters":
        {
        "p01":"lsF30nP7TsyFh",
        "p02":
            {
            "description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
            "id":"lsF3BxzFdn0kj",
            "name":"Linear Glass Mosaic Tiles",
            "object":"material"
            }
        },
    "resultDataContents":["row"],
    "statement":
        "MERGE (p:project { id: { p01 } })
        WITH p

        CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material  { p02 }) RETURN m"
    }
]}

查询:

MATCH (p)-->(:group)-->(i:material)
 WHERE (i.description=~ "(?i).*mosaic.*")
 RETURN i

返回:

name:   Linear Glass Mosaic Tiles
id: lsF3BxzFdn0kj
description:    Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
object: material

您可以尝试检查数据的方法是查看浏览器提供的json或csv转储(结果和表结果上的小下载图标)

或者您使用neo4j-shell和我的shell-import-tools实际输出csv或graphml并检查这些文件。

或使用一些java(或groovy)代码来检查您的数据。

还有neo4j-enterprise下载附带的一致性检查程序。这是关于如何运行它的blog post

java -cp 'lib/*:system/lib/*' org.neo4j.consistency.ConsistencyCheckTool /tmp/foo

我在这里添加了一个groovy测试脚本:https://gist.github.com/jexp/5a183c3501869ee63d30

还有一个想法:regexp flags

有时会出现多行的事情,还有两个标志:

  • multiline (?m)也匹配多行和
  • dotall (?s)允许点也匹配特殊字符,如换行符

你可以尝试(?ism).*mosaic.*