在Neo4j中按节点ID过滤索引命中

时间:2014-06-03 09:14:55

标签: filter indexing neo4j dataset

我有一组节点ID(设置< Long>),并希望将查询结果限制或过滤到此集合中的节点。有没有一种高效的方法呢?

 Set<Node> query(final GraphDatabaseService graphDb, final Set<Long> nodeSet) {
    final Index<Node> searchIndex = graphdb.index().forNodes("search");
    final IndexHits<Node> hits = searchIndex.query(new QueryContext("value*"));
    // what now to return only index hits that are in the given Set of Node's?
 }

2 个答案:

答案 0 :(得分:1)

反过来会不会更快?如果从集合中获取节点并将属性与您要查找的值进行比较?

for (Iterator it=nodeSet.iterator();it.hasNext();) {
   Node n=db.getNodeById(it.next());
   if (!n.getProperty("value","").equals("foo")) it.remove();
}

或您的建议

 Set<Node> query(final GraphDatabaseService graphDb, final Set<Long> nodeSet) {
    final Index<Node> searchIndex = graphdb.index().forNodes("search");
    final IndexHits<Node> hits = searchIndex.query(new QueryContext("value*"));
    Set<Node> result=new HashSet<>();
    for (Node n : hits) {
       if (nodeSet.contains(n.getId())) result.add(n);
    }
    return result;
 }

答案 1 :(得分:0)

因此,我发现最快的解决方案是直接在neo4j创建的索引上使用lucenes IndexSearcher,并使用自定义Filter将搜索限制为特定节点。

使用lucene IndexReader打开neo4j索引文件夹“{neo4j-database-folder} / index / lucene / node / {index-name}”。确保不要在neo4j使用的另一个版本中为你的项目添加lucene依赖项,目前是lucene 3.6.2!

这是我的lucene Filter实现,它按给定的文档ID集过滤所有查询结果。 (Lucene Document id(整数)不是Neo4j Node id(长)!)

import java.io.IOException;
import java.util.PriorityQueue;
import java.util.Set;

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.DocIdSet;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.Filter;

public class DocIdFilter extends Filter {

    public class FilteredDocIdSetIterator extends DocIdSetIterator {

        private final PriorityQueue<Integer> filterQueue;

        private int docId;

        public FilteredDocIdSetIterator(final Set<Integer> filterSet) {
            this(new PriorityQueue<Integer>(filterSet));
        }

        public FilteredDocIdSetIterator(final PriorityQueue<Integer> filterQueue) {
            this.filterQueue = filterQueue;
        }
        @Override
        public int docID() {
            return this.docId;
        }

        @Override
        public int nextDoc() throws IOException {
            if (this.filterQueue.isEmpty()) {
                this.docId = NO_MORE_DOCS;
            } else {
                this.docId = this.filterQueue.poll();
            }
            return this.docId;
        }

        @Override
        public int advance(final int target) throws IOException {
            while ((this.docId = this.nextDoc()) < target)
                ;
            return this.docId;
        }

    }

    private final PriorityQueue<Integer> filterQueue;

    public DocIdFilter(final Set<Integer> filterSet) {
        super();
        this.filterQueue = new PriorityQueue<Integer>(filterSet);
    }

    private static final long serialVersionUID = -865683019349988312L;

    @Override
    public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
        return new DocIdSet() {
            @Override
            public DocIdSetIterator iterator() throws IOException {
                return new FilteredDocIdSetIterator(DocIdFilter.this.filterQueue);
            }
        };
    }

}

要将neo4j节点id的集合(查询结果应该被过滤)映射到正确的lucene文档id,我创建了一个内存bidirectional map

public static HashBiMap<Integer, Long> generateDocIdToNodeIdMap(final IndexReader indexReader)
    throws LuceneIndexException {
    final HashBiMap<Integer, Long> result = HashBiMap.create(indexReader.numDocs());
    for (int i = 0; i < indexReader.maxDoc(); i++) {
    if (indexReader.isDeleted(i)) {
        continue;
    }
    final Document doc;
    try {
        doc = indexReader.document(i, new FieldSelector() {
            private static final long serialVersionUID = 5853247619312916012L;
            @Override
            public FieldSelectorResult accept(final String fieldName) {
                    if ("_id_".equals(fieldName)) {
                        return FieldSelectorResult.LOAD_AND_BREAK;
                    } else {
                        return FieldSelectorResult.NO_LOAD;
                    }
                }
            };
        );
    } catch (final IOException e) {
        throw new LuceneIndexException(indexReader.directory(), "could not read document with ID: '" + i
            + "' from index.", e);
    }
    final Long nodeId;
    try {
        nodeId = Long.valueOf(doc.get("_id_"));
    } catch (final NumberFormatException e) {
        throw new LuceneIndexException(indexReader.directory(),
            "could not parse node ID value from document ID: '" + i + "'", e);
    }
    result.put(i, nodeId);
    }
    return result;
}

我正在使用提供双向映射的Google Guava Library以及具有特定大小的集合的初始化。