我有一组节点ID(设置< Long>),并希望将查询结果限制或过滤到此集合中的节点。有没有一种高效的方法呢?
Set<Node> query(final GraphDatabaseService graphDb, final Set<Long> nodeSet) {
final Index<Node> searchIndex = graphdb.index().forNodes("search");
final IndexHits<Node> hits = searchIndex.query(new QueryContext("value*"));
// what now to return only index hits that are in the given Set of Node's?
}
答案 0 :(得分:1)
反过来会不会更快?如果从集合中获取节点并将属性与您要查找的值进行比较?
for (Iterator it=nodeSet.iterator();it.hasNext();) {
Node n=db.getNodeById(it.next());
if (!n.getProperty("value","").equals("foo")) it.remove();
}
或您的建议
Set<Node> query(final GraphDatabaseService graphDb, final Set<Long> nodeSet) {
final Index<Node> searchIndex = graphdb.index().forNodes("search");
final IndexHits<Node> hits = searchIndex.query(new QueryContext("value*"));
Set<Node> result=new HashSet<>();
for (Node n : hits) {
if (nodeSet.contains(n.getId())) result.add(n);
}
return result;
}
答案 1 :(得分:0)
因此,我发现最快的解决方案是直接在neo4j创建的索引上使用lucenes IndexSearcher,并使用自定义Filter将搜索限制为特定节点。
使用lucene IndexReader打开neo4j索引文件夹“{neo4j-database-folder} / index / lucene / node / {index-name}”。确保不要在neo4j使用的另一个版本中为你的项目添加lucene依赖项,目前是lucene 3.6.2!
这是我的lucene Filter实现,它按给定的文档ID集过滤所有查询结果。 (Lucene Document id(整数)不是Neo4j Node id(长)!)
import java.io.IOException;
import java.util.PriorityQueue;
import java.util.Set;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.DocIdSet;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.Filter;
public class DocIdFilter extends Filter {
public class FilteredDocIdSetIterator extends DocIdSetIterator {
private final PriorityQueue<Integer> filterQueue;
private int docId;
public FilteredDocIdSetIterator(final Set<Integer> filterSet) {
this(new PriorityQueue<Integer>(filterSet));
}
public FilteredDocIdSetIterator(final PriorityQueue<Integer> filterQueue) {
this.filterQueue = filterQueue;
}
@Override
public int docID() {
return this.docId;
}
@Override
public int nextDoc() throws IOException {
if (this.filterQueue.isEmpty()) {
this.docId = NO_MORE_DOCS;
} else {
this.docId = this.filterQueue.poll();
}
return this.docId;
}
@Override
public int advance(final int target) throws IOException {
while ((this.docId = this.nextDoc()) < target)
;
return this.docId;
}
}
private final PriorityQueue<Integer> filterQueue;
public DocIdFilter(final Set<Integer> filterSet) {
super();
this.filterQueue = new PriorityQueue<Integer>(filterSet);
}
private static final long serialVersionUID = -865683019349988312L;
@Override
public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
return new DocIdSet() {
@Override
public DocIdSetIterator iterator() throws IOException {
return new FilteredDocIdSetIterator(DocIdFilter.this.filterQueue);
}
};
}
}
要将neo4j节点id的集合(查询结果应该被过滤)映射到正确的lucene文档id,我创建了一个内存bidirectional map:
public static HashBiMap<Integer, Long> generateDocIdToNodeIdMap(final IndexReader indexReader)
throws LuceneIndexException {
final HashBiMap<Integer, Long> result = HashBiMap.create(indexReader.numDocs());
for (int i = 0; i < indexReader.maxDoc(); i++) {
if (indexReader.isDeleted(i)) {
continue;
}
final Document doc;
try {
doc = indexReader.document(i, new FieldSelector() {
private static final long serialVersionUID = 5853247619312916012L;
@Override
public FieldSelectorResult accept(final String fieldName) {
if ("_id_".equals(fieldName)) {
return FieldSelectorResult.LOAD_AND_BREAK;
} else {
return FieldSelectorResult.NO_LOAD;
}
}
};
);
} catch (final IOException e) {
throw new LuceneIndexException(indexReader.directory(), "could not read document with ID: '" + i
+ "' from index.", e);
}
final Long nodeId;
try {
nodeId = Long.valueOf(doc.get("_id_"));
} catch (final NumberFormatException e) {
throw new LuceneIndexException(indexReader.directory(),
"could not parse node ID value from document ID: '" + i + "'", e);
}
result.put(i, nodeId);
}
return result;
}
我正在使用提供双向映射的Google Guava Library以及具有特定大小的集合的初始化。