我开始处理Neo4j,所以我对这个话题并不十分熟练。在下图中,我有2模式(二分图),其中绿色节点呈现“文档”,红色节点呈现在特定文档中出现的“术语”。 (真实图表实际上是巨大的:大约20.000.000个文档和25.000个术语)。
我想知道如何计算neo4j中的共现术语对(在Cypher或Java中)。查询的所需输出应为:
# Example: Pair (term-1, term-2) occurs in doc-1 and in doc-3
# Frequency for pair (term-1, term-2) should be 2
# termA | term B | frequency
term-1 | term-2 | 2
term-1 | term-3 | 1
term-2 | term-3 | 2
图表位于http://console.neo4j.org/r/7fmo7c
在Neo4j中重现测试图的代码
set name root
mkrel -t ROOT -c -v
cd 1
set name doc-1
set type document
mkrel -t HAVE -cv
cd 2
set name term-1
set type term
cd ..
mkrel -t HAVE -cv
cd 3
set name term-2
set type term
cd ..
mkrel -t HAVE -cv
cd 4
set name term-3
set type term
mkrel -t HAVE -d INCOMING -c
cd 5
set name doc-2
set type document
mkrel -t HAVE -d OUTGOING -n 3
cd 3
mkrel -t HAVE -d INCOMING -c
cd 6
set name doc-3
set type document
mkrel -t HAVE -d OUTGOING -n 2
用Java重现测试图的代码
import org.neo4j.graphdb.DynamicRelationshipType;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.neo4j.graphdb.factory.GraphDatabaseSettings;
public class CountPairs {
private static final String DB_PATH = "test.db";
private static GraphDatabaseService graphDb;
public static void main(String[] args) {
graphDb = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder(DB_PATH).
setConfig(GraphDatabaseSettings.node_keys_indexable, "name, type").
setConfig(GraphDatabaseSettings.node_auto_indexing, "true").
newGraphDatabase();
Transaction tx = graphDb.beginTx();
Node doc1, doc2, doc3 = null;
Node term1, term2, term3 = null;
Relationship rel1, rel2, rel3, rel4, rel5, rel6, rel7 = null;
try
{
// Create nodes
doc1 = graphDb.createNode();
doc2 = graphDb.createNode();
doc3 = graphDb.createNode();
term1 = graphDb.createNode();
term2 = graphDb.createNode();
term3 = graphDb.createNode();
// Set properties
doc1.setProperty("name", "doc1");
doc1.setProperty("type", "document");
doc2.setProperty("name", "doc2");
doc2.setProperty("type", "document");
doc3.setProperty("name", "doc3");
doc3.setProperty("type", "document");
// Set properties
term1.setProperty("name", "term1");
term1.setProperty("type", "term");
term2.setProperty("name", "term2");
term2.setProperty("type", "term");
term3.setProperty("name", "term3");
term3.setProperty("type", "term");
// Create relations
rel1 = doc1.createRelationshipTo(term1, DynamicRelationshipType.withName("HAVE"));
rel2 = doc1.createRelationshipTo(term2, DynamicRelationshipType.withName("HAVE"));
rel3 = doc1.createRelationshipTo(term3, DynamicRelationshipType.withName("HAVE"));
rel4 = doc2.createRelationshipTo(term2, DynamicRelationshipType.withName("HAVE"));
rel5 = doc2.createRelationshipTo(term3, DynamicRelationshipType.withName("HAVE"));
rel6 = doc3.createRelationshipTo(term1, DynamicRelationshipType.withName("HAVE"));
rel7 = doc3.createRelationshipTo(term2, DynamicRelationshipType.withName("HAVE"));
tx.success();
}
catch(Exception e)
{
tx.failure();
}
finally
{
tx.finish();
}
graphDb.shutdown();
}
}
答案 0 :(得分:2)
start t1=node(*), t2=node(*)
where has(t1.type) and has(t2.type) and t1.type='term' and t2.type='term' and id(t1) < id(t2)
with t1, t2
match t1<-[:HAVE]-doc-[:HAVE]->t2
where doc.type='document'
return t1, t2, count(doc)
您可以在此处尝试:http://console.neo4j.org/r/pshvqx
我希望这就是你想要的。此外,为了获得更好的性能,我建议您在“term”类型的节点上放置索引,并在start子句中使用index来获取 t1 和 t2