如何匹配" xxx-xx-xxxx"为了使用弹性搜索查找带有社会安全号码的文件?

时间:2015-01-13 12:19:26

标签: java regex elasticsearch lucene

我正在寻找匹配“xxx-xx-xxxx”等模式的方法,以便使用elastic search查找包含社会安全号码的文档。

让我们假设,在索引文档中,我想找到所有那些社会安全号码与“xxx-xx-xxxx”模式匹配的文档。

索引文档的示例代码:

InputStream is = null;
    try {
      is = new FileInputStream("/home/admin/Downloads/20121221.doc");
      ContentHandler contenthandler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      Parser parser = new AutoDetectParser();
      parser.parse(is, contenthandler, metadata, new ParseContext());
      }
    catch (Exception e) {
      e.printStackTrace();
    }
    finally {
        if (is != null) is.close();
    } 

搜索示例代码

QueryBuilder queryBuilderFullText = null;
queryBuilderFullText = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
                        FilterBuilders.regexpFilter("_all", "^(\\d{3}-?\\d{2}-?\\d{4}|XXX-XX-XXXX)$"));
SearchRequestBuilder requestBuilder;
            requestBuilder = client.prepareSearch()
                    .setIndices(getDomainIndexId(project))
                    .setTypes(getProjectTypeId(project))
                    .setQuery(queryBuilderFullText);
SearchResponse response = requestBuilder.execute().actionGet(ES_TIMEOUT_MS);
            SearchHits hits = response.getHits();
if (hits.getTotalHits() > 0) {
System.out.println(hits.getTotalHits());
 } else {
                return 0l;  
        }

即使许多文件与模式匹配,我也总是获得零点击。

0 个答案:

没有答案