合并查询性能 - 唯一约束或索引?

时间:2016-09-20 09:24:39

标签: indexing neo4j constraints cypher

我正在向图表推送2K +节点和8k +边缘,这需要大约7000ms。而且,我将进一步使用100k +节点和关系。我的查询以这种方式使用合并操作:

MERGE (a:User){user:'username'}
MERGE (b:Hobby){hobby:'hobby'}
MERGE (a)-[r:Hobby]->(b)
  

注意:用户名和爱好是查询中的字符串

现在,我正在尝试提高查询的性能。为此,谷歌搜索后,我开始了解两种方式。

  1. 索引节点属性用户名和爱好。这样,合并    操作将提升表现。
  2. CREATE Con​​straint on node properties username和hobby。很多人    提出这种方法。
  3. 我的问题是:

    1. 索引属性和创建约束有什么区别    在房产? Graph如何处理这些(就像它在内部做的那样)操作?
    2. 哪种方法可以改善效果?
    3. 编辑:

      我的代码:

      session = driver.session()
      session.run('CREATE CONSTRAINT ON (u:user) ASSERT u.user IS UNIQUE')
      session.run('CREATE CONSTRAINT ON (h:hobby) ASSERT h.hobby IS UNIQUE')
      
      session.close()
      
      def writeBatch(b):
          print("writing batch of " + str(len(b)))
          session = driver.session()
          session.run('UNWIND {batch} AS elt '+
                      'MERGE (u:user{user: elt.user})'+
                      'MERGE (h:hobby{hobby:elt.hobby})'+
                      'MERGE (u)-[r:hobby]->(h)'
                      +'', {'batch': b})
          session.close()
      

      错误:

          Traceback (most recent call last):
        File "/Users/adaggula/Documents/workspace2/Facebook/FbNeo.py", line 145, in <module>
          userhobby.foreach(write2neo)
        File "/usr/local/spark/python/pyspark/rdd.py", line 747, in foreach
          self.mapPartitions(processPartition).count()  # Force evaluation
        File "/usr/local/spark/python/pyspark/rdd.py", line 1004, in count
          return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
        File "/usr/local/spark/python/pyspark/rdd.py", line 995, in sum
          return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
        File "/usr/local/spark/python/pyspark/rdd.py", line 869, in fold
          vals = self.mapPartitions(func).collect()
        File "/usr/local/spark/python/pyspark/rdd.py", line 771, in collect
          port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
        File "/usr/local/spark/python/pyspark/rdd.py", line 2379, in _jrdd
          pickled_cmd, bvars, env, includes = _prepare_for_python_RDD(self.ctx, command, self)
        File "/usr/local/spark/python/pyspark/rdd.py", line 2299, in _prepare_for_python_RDD
          pickled_command = ser.dumps(command)
        File "/usr/local/spark/python/pyspark/serializers.py", line 428, in dumps
          return cloudpickle.dumps(obj, 2)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 646, in dumps
          cp.dump(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 107, in dump
          return Pickler.dump(self, obj)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
          self.save(obj)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 199, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 236, in save_function_tuple
          save((code, closure, base_globals))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 633, in _batch_appends
          save(x)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 199, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 236, in save_function_tuple
          save((code, closure, base_globals))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 633, in _batch_appends
          save(x)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 199, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 236, in save_function_tuple
          save((code, closure, base_globals))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 633, in _batch_appends
          save(x)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 199, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 236, in save_function_tuple
          save((code, closure, base_globals))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 633, in _batch_appends
          save(x)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 199, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 236, in save_function_tuple
          save((code, closure, base_globals))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 636, in _batch_appends
          save(tmp[0])
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 199, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 236, in save_function_tuple
          save((code, closure, base_globals))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 636, in _batch_appends
          save(tmp[0])
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 193, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 241, in save_function_tuple
          save(f_globals)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 193, in save_function
          self.save_function_tuple(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 241, in save_function_tuple
          save(f_globals)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 686, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 542, in save_reduce
          save(state)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 524, in save_reduce
          save(args)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
          self._batch_appends(iter(obj))
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 636, in _batch_appends
          save(tmp[0])
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 542, in save_reduce
          save(state)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 542, in save_reduce
          save(state)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 542, in save_reduce
          save(state)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 686, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 542, in save_reduce
          save(state)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
          self.save_reduce(obj=obj, *rv)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 542, in save_reduce
          save(state)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 548, in save_tuple
          save(element)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
          self._batch_setitems(obj.iteritems())
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
          save(v)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
          f(self, obj) # Call unbound method with explicit self
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 315, in save_builtin_function
          return self.save_function(obj)
        File "/usr/local/spark/python/pyspark/cloudpickle.py", line 191, in save_function
          if islambda(obj) or obj.__code__.co_filename == '<stdin>' or themodule is None:
      AttributeError: 'builtin_function_or_method' object has no attribute '__code__'
      16/09/20 16:35:22 INFO SparkContext: Invoking stop() from shutdown hook
      

1 个答案:

答案 0 :(得分:2)

索引与约束

索引是查找索引属性具有特定值的节点的快速方法,替换所有节点的顺序扫描(而不是 O(n)算法,通常得到 O(的log(n)))。许多节点可以具有相同值的属性。

约束是一种在数据上强制执行模式的方法。 Neo4j中的节点有两种类型的约束:

  1. 财产缺失:

    CREATE CONSTRAINT ON (n:Node) ASSERT n.uuid IS UNIQUE;
    
  2. 财产存在:

    CREATE CONSTRAINT ON (n:Node) ASSERT exists(n.name);
    
  3. 实际上,unicity约束使用索引快速查找另一个节点是否已使用相同的值。

    因此,带有unicity约束的标签在属性上也有一个索引,但是属性上带索引的标签不需要单一性。

    我应该使用哪一个?

    由于您使用MERGE查找或创建UserHobby节点,因此这些属性显然是唯一的。你绝对应该使用unicity约束来强制执行模式,而不是简单地使用索引。

    CREATE CONSTRAINT ON (n:User) ASSERT n.user IS UNIQUE;
    CREATE CONSTRAINT ON (n:Hobby) ASSERT n.hobby IS UNIQUE;