Hibernate搜索没有索引电子邮件地址?

时间:2015-11-02 23:04:36

标签: groovy spring-boot hibernate-search

我希望使用Hibernate Search对实体中的电子邮件地址进行全文搜索。

鉴于以下实体" Person"使用索引字段"电子邮件":

Person.groovy

package com.example

import javax.persistence.Entity
import javax.persistence.GeneratedValue
import javax.persistence.GenerationType
import javax.persistence.Id

import org.hibernate.search.annotations.Field
import org.hibernate.search.annotations.Indexed

@Entity
@Indexed
class Person {
    @Id
    @GeneratedValue(strategy=GenerationType.AUTO)
    Long id

    @Field
    String email
}

并给出了存储库

SearchRepository.groovy

package com.example

import javax.persistence.EntityManager

import org.apache.lucene.search.Query
import org.hibernate.search.jpa.FullTextEntityManager
import org.hibernate.search.jpa.Search
import org.hibernate.search.query.dsl.QueryBuilder
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.stereotype.Repository

@Repository
class SearchRepository {

    @Autowired
    EntityManager entityManager

    FullTextEntityManager getFullTextEntityManager() {
        Search.getFullTextEntityManager(entityManager)
    }

    List<Person> findPeople(String searchText){
        searchText = searchText.toLowerCase()+'*'
        QueryBuilder qb = fullTextEntityManager.searchFactory
                .buildQueryBuilder().forEntity(Person).get()
        Query query =
                qb
                .keyword()
                .wildcard()
                .onField('email')
                .matching(searchText)
                .createQuery()

        javax.persistence.Query jpaQuery =
                fullTextEntityManager.createFullTextQuery(query, Person)

        jpaQuery.resultList
    }
}

然后以下测试失败:

SearchWildcardTest.groovy

package com.example

import javax.persistence.EntityManager

import org.hibernate.search.jpa.FullTextEntityManager
import org.hibernate.search.jpa.Search
import org.junit.Test
import org.junit.runner.RunWith
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.boot.test.SpringApplicationConfiguration
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner
import org.springframework.transaction.annotation.Transactional

@RunWith(SpringJUnit4ClassRunner)
@SpringApplicationConfiguration(classes = HibernateSearchWildcardApplication)
@Transactional
class SearchWildcardTest {

    @Autowired
    SearchRepository searchRepo

    @Autowired
    PersonRepository personRepo

    @Autowired
    EntityManager em

    FullTextEntityManager getFullTextEntityManager() {
        Search.getFullTextEntityManager(em)
    }

    @Test
    void findTeamsByNameWithWildcard() {
        Person person = personRepo.save new Person(email: 'foo@bar.com')

        fullTextEntityManager.createIndexer().startAndWait()
        fullTextEntityManager.flushToIndexes()

        List<Person> people = searchRepo.findPeople('foo@bar.com')

        assert people.contains(person)  // this assertion fails! Why?
    }
}

PersonRepository.groovy

package com.example

import org.springframework.data.repository.CrudRepository

interface PersonRepository extends CrudRepository<Person, Long>{
}

的build.gradle

buildscript {
    ext {
        springBootVersion = '1.2.7.RELEASE'
    }
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
        classpath('io.spring.gradle:dependency-management-plugin:0.5.2.RELEASE')
    }
}

apply plugin: 'groovy'
apply plugin: 'eclipse'
apply plugin: 'spring-boot'
apply plugin: 'io.spring.dependency-management'

jar {
    baseName = 'hibernate-search-email'
    version = '0.0.1-SNAPSHOT'
}
sourceCompatibility = 1.8
targetCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencies {
    compile('org.springframework.boot:spring-boot-starter-data-jpa')
    compile('org.codehaus.groovy:groovy')
    compile('org.hibernate:hibernate-search:5.3.0.Final')
    testCompile('com.h2database:h2')
    testCompile('org.springframework.boot:spring-boot-starter-test')
}

task wrapper(type: Wrapper) {
    gradleVersion = '2.8'
}

以下是Luke在运行测试后从生成的Lucene索引中显示的内容:

enter image description here

在我看来,电子邮件地址&#34; foo@bar.com"并没有完全存储在索引中,而是被拆分成两个字符串&#34; foo&#34;和&#34; bar.com&#34;。

&#34;入门&#34;来自官方Hibernate Search website的指南 说明

  

[...]标准分词器将标点字符和连字符分隔,同时保持电子邮件地址和互联网主机名不变。它是一个很好的通用标记器。 [...]

我必须在这里找不到的东西,但却无法弄清楚。

我的问题:

  • 为什么我的代码没有索引完整的电子邮件地址?
  • 我如何实现它以索引地址以便测试通过?

1 个答案:

答案 0 :(得分:4)

似乎文档不能正确反映底层Lucene API的变化。

  

[K]保持电子邮件地址和互联网主机名不变......

这对于传统的StandardTokenizer来说是正确的,因为从那时起Lucene方面已经改变了。ClassicTokenizer。它的行为现在可以在@Entity @Indexed @AnalyzerDef( name = "emailanalyzer", tokenizer = @TokenizerDef(factory = ClassicTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), } ) class Person { // ... @Field @Analyzer(definition = "emailanalyzer") String email; } 中找到。

因此,以下配置应该为您提供所需内容:

{{1}}

请注意,此配置也会应用修剪。我们将相应调整HSEARCH文档,感谢您发现这一点!