我有一个简单的Luigi Elasticsearch indexing task,它使用Requests进行GET并将响应推送到本地ElasticSearch。另外,我做了第二次调用前几次的任务,如下所示:
import luigi
import requests
from luigi.contrib.esindex import CopyToIndex
class RequestTask(CopyToIndex):
TEST_URL = 'http://www.this-page-intentionally-left-blank.org'
index = 'example_index'
iteration = luigi.IntParameter()
def docs(self):
res = requests.get(self.TEST_URL).content.decode('utf-8')
return [{'response': res, 'iteration': self.iteration}]
class ManyRequests(luigi.Task):
def requires(self):
return [RequestTask(iteration) for iteration in range(0, 4)]
if __name__ == '__main__':
luigi.run()
如果我在单线程中运行ManyRequests任务,可以正常运行。但是,如果我指定了一些工作人员(例如 - 工作人员4),则进程将从Elasticsearch引发 TransportError(index_already_exists_exception),并且未能正确完成 。完成的进程数是随机的,所以我认为这是由于在Elasticsearch数据库中写入了一些冲突。我是否必须以不同的方式实现ManyRequests?
任何帮助都将非常感激:)
当我执行ManyRequests --workers 4:
时,这是我的控制台DEBUG: Checking if RequestTask(iteration=0) is complete
GET http://localhost:9200/update_log/entry/f55cf781cd5b4ff6be1454bc7fc624f874dea7ee [status:404 request:0.082s]
DEBUG: Marker document not found.
DEBUG: Checking if RequestTask(iteration=1) is complete
GET http://localhost:9200/update_log/entry/91af5a96a3e588ae318e996fd64add17465352b3 [status:404 request:0.020s]
DEBUG: Marker document not found.
DEBUG: Checking if RequestTask(iteration=2) is complete
GET http://localhost:9200/update_log/entry/41bb5cbca30df86d0815ec090b4d2fb20f2700d2 [status:404 request:0.051s]
DEBUG: Marker document not found.
DEBUG: Checking if RequestTask(iteration=3) is complete
GET http://localhost:9200/update_log/entry/d2dbeeca292ec62688a993c3b147272af2ba6a92 [status:404 request:0.061s]
DEBUG: Marker document not found.
INFO: Informed scheduler that task ManyRequests__99914b932b has status PENDING
INFO: Informed scheduler that task RequestTask_3_8a58dae6a3 has status PENDING
INFO: Informed scheduler that task RequestTask_2_eee8bd7963 has status PENDING
INFO: Informed scheduler that task RequestTask_1_020ce0ec4d has status PENDING
INFO: Informed scheduler that task RequestTask_0_630962ba24 has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 4 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 5
DEBUG: Asking scheduler for work...
INFO: [pid 3211] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) running RequestTask(iteration=3)
DEBUG: Pending tasks: 4
DEBUG: Asking scheduler for work...
INFO: [pid 3212] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) running RequestTask(iteration=2)
DEBUG: Pending tasks: 3
DEBUG: Asking scheduler for work...
INFO: [pid 3213] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) running RequestTask(iteration=0)
DEBUG: Pending tasks: 2
DEBUG: 4 running tasks, waiting for next task to finish
INFO: [pid 3214] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) running RequestTask(iteration=1)
DEBUG: 4 running tasks, waiting for next task to finish
PUT http://localhost:9200/example_index [status:400 request:0.514s]
PUT http://localhost:9200/example_index [status:400 request:0.517s]
PUT http://localhost:9200/example_index [status:400 request:0.520s]
ERROR: [pid 3213] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) failed RequestTask(iteration=0)
Traceback (most recent call last):
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/worker.py", line 192, in run
new_deps = self._run_get_new_deps()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/worker.py", line 130, in _run_get_new_deps
task_gen = self.task.run()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/contrib/esindex.py", line 448, in run
self.create_index()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/contrib/esindex.py", line 399, in create_index
es.indices.create(index=self.index, body=self.settings)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 71, in _wrapped
return func(*args, params=params, **kwargs)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/client/indices.py", line 107, in create
params=params, body=body)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 127, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/connection/base.py", line 122, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'index [example_index/PpySzpJ-QiSLNupQrmdVjg] already exists')ERROR: [pid 3212] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) failed RequestTask(iteration=2)
Traceback (most recent call last):
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/worker.py", line 192, in run
new_deps = self._run_get_new_deps()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/worker.py", line 130, in _run_get_new_deps
task_gen = self.task.run()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/contrib/esindex.py", line 448, in run
self.create_index()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/contrib/esindex.py", line 399, in create_index
es.indices.create(index=self.index, body=self.settings)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 71, in _wrapped
return func(*args, params=params, **kwargs)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/client/indices.py", line 107, in create
params=params, body=body)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 127, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/connection/base.py", line 122, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'index [example_index/PpySzpJ-QiSLNupQrmdVjg] already exists')
ERROR: [pid 3214] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) failed RequestTask(iteration=1)
Traceback (most recent call last):
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/worker.py", line 192, in run
new_deps = self._run_get_new_deps()
File "/Users/jgc/dev/upm/tfg/TFG-JorgeGarciaCastano/env/lib/python3.5/site-packages/luigi/worker.py", line 130, in _run_get_new_deps
task_gen = self.task.run()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/contrib/esindex.py", line 448, in run
self.create_index()
File "/Users/jgc/dev/env/lib/python3.5/site-packages/luigi/contrib/esindex.py", line 399, in create_index
es.indices.create(index=self.index, body=self.settings)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 71, in _wrapped
return func(*args, params=params, **kwargs)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/client/indices.py", line 107, in create
params=params, body=body)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 127, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/jgc/dev/env/lib/python3.5/site-packages/elasticsearch/connection/base.py", line 122, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'index [example_index/PpySzpJ-QiSLNupQrmdVjg] already exists')
INFO: Informed scheduler that task RequestTask_2_eee8bd7963 has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: RequestTask_3_8a58dae6a3 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
DEBUG: RequestTask_1_020ce0ec4d is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
DEBUG: RequestTask_0_630962ba24 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
INFO: Informed scheduler that task RequestTask_1_020ce0ec4d has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: RequestTask_3_8a58dae6a3 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
DEBUG: RequestTask_0_630962ba24 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
INFO: Informed scheduler that task RequestTask_0_630962ba24 has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: RequestTask_3_8a58dae6a3 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: RequestTask_3_8a58dae6a3 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: RequestTask_3_8a58dae6a3 is currently run by worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210)
INFO: [pid 3211] Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) done RequestTask(iteration=3)
INFO: Informed scheduler that task RequestTask_3_8a58dae6a3 has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 4 pending tasks possibly being run by other workers
DEBUG: There are 4 pending tasks unique to this worker
DEBUG: There are 4 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=582258671, workers=4, host=jgc.local, username=jgc, pid=3210) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 5 tasks of which:
* 1 ran successfully:
- 1 RequestTask(iteration=3)
* 3 failed:
- 3 RequestTask(iteration=0,1,2)
* 1 were left pending, among these:
* 1 had failed dependencies:
- 1 ManyRequests()
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====