我正在使用gremlin服务器,我有一个大数据集,并且正在执行gremlin分页。以下是查询示例:
query = """g.V().both().both().count()"""
data = execute_query(query)
for x in range(0,int(data[0]/10000)+1):
print(x*10000, " - ",(x+1)*10000)
query = """g.V().both().both().range({0}*10000, {1}*10000)""".format(x,x+1)
data = execute_query(query)
def execute_query(query):
"""query execution"""
以上查询工作正常,对于分页,我必须知道停止执行查询的范围。为了获得范围,我必须首先获取查询的计数并传递给for循环。还有其他可以使用gremlin的分页吗?
-需要分页,因为在单个ex中获取100k数据时分页失败。 g.V().both().both().count()
如果我们不使用分页,则会出现以下错误:
ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 554, in wrapper
return callback(*args)
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 343, in wrapped
raise_exc_info(exc)
File "<string>", line 3, in raise_exc_info
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 807, in _on_frame_data
self._receive_frame()
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 697, in _receive_frame
self.stream.read_bytes(2, self._on_frame_start)
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 312, in read_bytes
assert isinstance(num_bytes, numbers.Integral)
File "/usr/lib/python3.5/abc.py", line 182, in __instancecheck__
if subclass in cls._abc_cache:
File "/usr/lib/python3.5/_weakrefset.py", line 75, in __contains__
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison
ERROR:tornado.application:Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f3e1c409ae8>)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/ioloop.py", line 604, in _run_callback
ret = callback()
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 554, in wrapper
return callback(*args)
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 343, in wrapped
raise_exc_info(exc)
File "<string>", line 3, in raise_exc_info
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 807, in _on_frame_data
self._receive_frame()
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 697, in _receive_frame
self.stream.read_bytes(2, self._on_frame_start)
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 312, in read_bytes
assert isinstance(num_bytes, numbers.Integral)
File "/usr/lib/python3.5/abc.py", line 182, in __instancecheck__
if subclass in cls._abc_cache:
File "/usr/lib/python3.5/_weakrefset.py", line 75, in __contains__
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison
Traceback (most recent call last):
File "/home/rgupta/Documents/BitBucket/ecodrone/ecodrone/test2.py", line 59, in <module>
data = execute_query(query)
File "/home/rgupta/Documents/BitBucket/ecodrone/ecodrone/test2.py", line 53, in execute_query
results = future_results.result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/resultset.py", line 81, in cb
f.result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/connection.py", line 77, in _receive
self._protocol.data_received(data, self._results)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
self.data_received(data, results_dict)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
self.data_received(data, results_dict)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
self.data_received(data, results_dict)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
此行重复100次File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
答案 0 :(得分:1)
此问题在很大程度上得到了here的回答,但我将添加更多评论。
您的分页方法确实非常昂贵,因为我不知道会优化该特定遍历的任何图形,并且您基本上要遍历所有这些数据。您对ptq = pq1 + pq2;
printf ("Your grades from prelims is %.2f", ptq);
执行一次,然后对第一个10000进行迭代,然后对第二个10000,对第一个10000进行迭代,然后对第二个10000进行迭代,然后对第三个10000进行迭代,对第一个20000进行迭代,然后进行迭代第三个10000,依此类推...
我不确定您的逻辑是否还有其他内容,但是您所拥有的看起来像是“分批”的一种形式,可以获取更小的结果。不需要那样做,因为Gremlin Server已经在内部为您完成此操作。您只是发送count()
,如果使用g.V().both().both()
配置选项,Gremlin Server将对结果进行批处理。
无论如何,除了我提到的另一个问题所解释的内容之外,没有一种更好的方法可以使分页工作成为我所知道的。