Question

使用Solr在复杂的JSON中搜索会有什么好的设计？例如，可能有一个文件：

{
    "books" : [
        {
            "title" : "Some title",
            "author" : "Some author",
            "genres" : [
                "thriller",
                "drama"
             ]
        },
        {
            "title" : "Some other title",
            "author" : "Some author",
            "genres" : [
                "comedy",
                "nonfiction",
                "thriller"
             ]
         }
    ]
 }

示例查询将获取所有具有作者为“Some author”的书籍的文档，其中一本书的类型为“戏剧”。

现在我提出的设计是在schema.xml中有一个dynamicField，它将所有内容编入索引（现在），如下所示：

 <dynamicField name="*" type="text" index="true" stored="true"/>

然后SolrJ用于解析JSON并创建一个SolrInputDocument，其中包含每个数据的字段。例如，这些是为上面的示例JSON创建的字段/值：

books0.title : "Some title"
books0.author : "Some author"
books0.genres0 : "thriller"
books0.genres1 : "drama"
books1.title : "Some other title"
books1.author : "Some author"
books1.genres0 : "comedy"
books1.genres1 : "nonfiction"
books1.genres2 : "thriller"

此时我们可以使用LukeRequestHandler来获取索引中的所有字段，然后创建一个大的Solr查询来检查我们感兴趣的所有字段。对于上面的示例查询，查询将检查所有“书籍＃ .author“和”books＃.genres＃“字段。这个解决方案似乎不够优雅，如果有很多字段，查询可能会变得非常大。

能够在字段名称中使用通配符会很有用，但我认为Solr无法做到这一点。

有没有更好的方法来实现这一点，可能是通过在模式中使用“copyField”和“multiValued”的一些巧妙组合？

Answer 1

您可以将图书实体编入索引。

<field name="id" type="string" indexed="true" stored="true" required="true" />  
<field name="title" type="text_general" indexed="true" stored="true"/>   
<!-- Don't perform stemming on authors - You can use field with lower case, ascii folding for analysis -->   
<field name="authors" type="string" indexed="true" stored="true" multiValued="true"/>  
<field name="genre" type="string" indexed="true" stored="true" multiValued="true"/>

使用Dismax parser搜索作者和流派在这些字段上匹配应该返回文档。
您也可以使用类型进行过滤filter query。 FQ =体裁：剧

如果您希望字段的搜索行为不同，则可以使用copyField复制字段并对其执行不同的分析。 e.g。

<field name="genre_search" type="text_general" indexed="true" stored="true" multiValued="true"/>

<copyField source="genre" dest="genre_search"/>

Answer 2

也许值得你看Solr Joins。它仅适用于4.0，现在是alpha版本，但可以让您至少模拟部分或者所有这些复杂的关系。性能不如没有连接的vanilla solr，但可以完全有效，你应该验证。

用于在复杂JSON中搜索的Solr设计

2 个答案: