Modeshape全文搜索仅适用于二进制文件

时间:2017-04-08 19:46:10

标签: text full-text-search apache-tika modeshape

我正在尝试在我的Modeshape 5.3.0.Final存储库上执行全文搜索。查询非常简单:

Query query = queryManager.createQuery("SELECT * FROM [nt:resource] as data WHERE ISDESCENDANTNODE('/somenode') AND CONTAINS(data.*,'*" + text + "*')

看起来它适用于二进制存储文件(即pdf,doc,docx等)但它与txt文件或任何文本格式文件不匹配。

这是我的存储库配置

{
  "name": "Persisted-Repository",
  "textExtraction": {
    "extractors": {
      "tikaExtractor": {
        "name": "General content-based extractor",
        "classname": "tika"
      }
    }
  },
  "workspaces": {
    "predefined": [
      "otherWorkspace"
    ],
    "default": "default",
    "allowCreation": true
  },
  "security": {
    "anonymous": {
      "roles": [
        "readonly",
        "readwrite",
        "admin"
      ],
      "useOnFailedLogin": false
    }
  },
  "storage": {
    "persistence": {
      "type": "file",
      "path": "/var/content/storage"
    },
    "binaryStorage": {
      "type": "file",
      "directory": "/var/content/binaries",
      "minimumBinarySizeInBytes": 999,
      "mimeTypeDetection": "content"
    }
  },
  "indexProviders": {
    "lucene": {
      "classname": "lucene",
      "directory": "/var/content/indexes"
    }
  },
  "indexes": {
    "textFromFiles": {
      "kind": "text",
      "provider": "lucene",
      "nodeType": "nt:resource",
      "columns": "jcr:data(BINARY)"
    }
  }
}

目前我正在通过执行另一个搜索已配置的文本文件扩展名并手动使用Tika(可能因为它的文本已经在这里不需要Tika ...)提取文本并搜索发生。

有人知道这是预期的行为还是我做错了什么?

干杯!

0 个答案:

没有答案