Question

In SQL where is an option (NOT EXISTS) that allows us to only select a row if there is zero results from other SELECT.

For example:

SELECT 
    c.CustomerKey
FROM
    Customer c, Sales s1, Date d
WHERE
    s1.CustomerKey = c.CustomerKey AND
    s1.OrderDateKey = d.DateKey AND
    s1.ShipDate > s1.DueDate AND
    NOT EXISTS (SELECT *
                FROM Sales s2
                WHERE s2.OrderDateKey = s1.OrderDateKey AND
                      s2.CustomerKey <> s1.CustomerKey)
GROUP BY 
    c.CustomerKey

I tried to do the following but the query never ends so I assume I'm doing it the wrong way. What am I missing?

MATCH (d1:Date)<-[:ORDERDATE]-(s1:Sales)-[:CUSTOMER]->(c1:Customer)
WHERE s1.ShipDate > s1.DueDate
WITH d1,s1,c1
MATCH (s2:Sales)-[:CUSTOMER]->(c2:Customer)
WHERE NOT(s2.OrderDateKey=s1.OrderDateKey AND c2.CustomerKey<>c1.CustomerKey)
RETURN c2.CustomerKey

Answer 1

The query below should do what you want.

First, you should create an index on :Sales(OrderDateKey) so that the OPTIONAL MATCH in the query below can quickly find the desired Sales nodes (instead of scanning all of them):

CREATE INDEX ON :Sales(OrderDateKey);

When the OPTIONAL MATCH clause fails to find a match, it sets its unbound identifiers to NULL. The following query takes advantage of that fact:

MATCH (:Date)<-[:ORDERDATE]-(s1:Sales)-[:CUSTOMER]->(c1:Customer)
WHERE s1.ShipDate > s1.DueDate
WITH s1.OrderDateKey AS odk, c1.CustomerKey AS customerKey
OPTIONAL MATCH (s2:Sales)-[:CUSTOMER]->(c2:Customer)
WHERE s2.OrderDateKey=odk AND c2.CustomerKey<>customerKey
WITH customerKey
WHERE c2 IS NULL
RETURN DISTINCT customerKey;

Answer 2

The tricky part of translating SQL to Cypher is figuring out when we should still do joins and predicates based on keys, vs when we should be translating those operations into usages of nodes and relationships.

Let's first translate what the SQL means, as best as I can tell:

We want to match a Sale with a Customer and an order Date, where the sale's ship date is past the due date, and there isn't already a Sale with the same order Date for a different Customer.

It looks like Sale.OrderDateKey is a foreign key to Date.DateKey's primary key, and that Sales.CustomerKey is a foreign key to Customer.CustomerKey's primary key.

If the above assumption is true, then we don't need to work with these keys at all...where SQL uses foreign and primary keys for joining, Neo4j uses relationships between nodes instead, so we don't need to actually use these fields for anything in this query except the returned values.

MATCH (orderDate:Date)<-[:ORDERDATE]-(s1:Sales)-[:CUSTOMER]->(c1:Customer)
WHERE s1.ShipDate > s1.DueDate
WITH orderDate, c1
// match to exclude is a sale with the same orderDate but different customer
OPTIONAL MATCH (orderDate)<-[:ORDERDATE]-(:Sales)-[:CUSTOMER]->(c2:Customer)
WHERE c1 <> c2
WITH c1
WHERE c2 IS NULL
RETURN DISTINCT c1.customerKey;

Neo4j Cypher MATCH if Not exists

2 个答案: