Question

有大量不同类型的实体：

interface Entity {
}

interface Entity1 extends Entity {
  String field1();
  String field2();
}

interface Entity2 extends Entity {
  String field1();
  String field2();
  String field3();
}

interface Entity3 extends Entity {
  String field12();
  String field23();
  String field34();
}

Set<Entity> entities = ...

任务是实现此集的全文搜索。通过全文搜索，我的意思是我只需要获取包含我正在寻找的子字符串的实体（我不需要知道确切的属性，这个子标记的确切偏移量等）。在当前实现中，Entity接口具有方法matches(String)：

interface Entity {
  boolean matches(String text);
}

每个实体类根据其内部实现它：

class Entity1Impl implements Entity1 {
  public String field1() {...}
  public String field2() {...}

  public boolean matches(String text) {
    return field1().toLowerCase().contains(text.toLowerCase()) ||
           field2().toLowerCase().contains(text.toLowerCase());
  }
}

我相信这种方法非常糟糕（但是，它有效）。我正在考虑每次有新的设置时使用Lucene构建索引。按索引我的意思是内容 - ＆gt; id 映射。内容只是我正在考虑的所有领域的一个微不足道的“总和”。因此，对于Entity1，内容将是field1()和field2()的连接。我对性能有一些疑问：构建索引通常是一项非常昂贵的操作，所以我不确定它是否有帮助。

你还有其他建议吗？

澄清细节：

Set<Entity> entities = ...是~10000项。
Set<Entity> entities = ...未从数据库中读取，因此我不能只添加where ...条件。数据源非常重要，所以我无法解决问题。
Entities应该被认为是短篇文章，所以有些字段可能高达10KB，而其他字段可能是~10字节。
我需要经常执行此搜索，但每次查询字符串和原始集都不同，所以看起来我不能只构建一次索引（因为每次实体集都不同）。 / LI>

Answer 1

我强烈考虑将Lucene与SOLR结合使用。 http://lucene.apache.org/java/docs/index.html

Answer 2

对于这样一个复杂的Object域，您可以使用像Compass这样的lucene包装工具，它允许使用与ORM相同的方法（如hibernate）快速将对象图映射到lucene索引

Java的全文搜索解决方案？

2 个答案: