我正在使用带有DataStore和DataPusher插件的CKAN 2.2.1。几乎所有数据都通过导入脚本进行更新,该脚本定期检查FTP目录,并将文件与CKAN中的资源进行匹配。
我们通过匹配资源ID,每次找到新文件时更新文件。然后,在发送update_resource
API请求后,DataPusher插件会自动将这些内容添加到DataStore中。
我正在记录用于发送这些update_resource
请求的cURL请求,以确认我们正在按预期发送请求,并且一切正常。
当我们更新具有多个资源的数据集中的资源时,似乎正在发生的事情是DataPusher添加了一个作业来更新我们刚刚更新的资源,但是2-3秒之后为随机资源添加了另一个作业在数据集中。这通常是我们即将更新的另一个资源,甚至是我们刚刚更新的资源。
这会导致在数据存储区中更新相同资源的重叠作业并导致错误,因为同一个表正在同时更新。
我已经确认我们没有错误地通过我的cURL日志发送多个更新请求,但是如果我要发出三个update_resource
请求,DataPusher将立即更新5或6个资源。这始终是我们期望的资源,其次是另一个随机资源。
我可以在datapusher访问日志中看到它被触发了三次以上:
在这个特殊的例子中,我每周更新三个资源,每天一次,每天一次,每天一次。有三个update_resource
个请求。
这是生成的访问日志:
127.0.0.1 - - [17/Sep/2015:17:20:04 +0100] "POST /job HTTP/1.1" 200 505 "-" "python-requests/1.1.0 CPython/2.7.3 Linux/3.2.0-4-amd64" **0/65095** rx:514 tx:665
127.0.0.1 - - [17/Sep/2015:17:20:09 +0100] "POST /job HTTP/1.1" 200 505 "-" "python-requests/1.1.0 CPython/2.7.3 Linux/3.2.0-4-amd64" **0/182644** rx:514 tx:665
127.0.0.1 - - [17/Sep/2015:17:20:12 +0100] "POST /job HTTP/1.1" 200 505 "-" "python-requests/1.1.0 CPython/2.7.3 Linux/3.2.0-4-amd64" **0/138258** rx:514 tx:665
127.0.0.1 - - [17/Sep/2015:17:20:16 +0100] "POST /job HTTP/1.1" 200 505 "-" "python-requests/1.1.0 CPython/2.7.3 Linux/3.2.0-4-amd64" **0/137663** rx:514 tx:665
127.0.0.1 - - [17/Sep/2015:17:20:19 +0100] "POST /job HTTP/1.1" 200 505 "-" "python-requests/1.1.0 CPython/2.7.3 Linux/3.2.0-4-amd64" **0/118033** rx:514 tx:665
这是生成的错误日志,显示紧接着发生的随机更新,以及由于重叠更新到同一文件而产生的错误:
有关为何触发此额外DataPusher作业的任何建议?
[Thu Sep 17 17:20:04 2015] [error] Fetching from: http://hidden-domain.com/dataset/6449d8f5-76e7-4aff-bfb1-46d46542a56c/resource/b9cde552-ff7a-41c3-b6e6-1f16a80a09eb/download/footfalldaily.csv
[Thu Sep 17 17:20:05 2015] [error] Deleting "b9cde552-ff7a-41c3-b6e6-1f16a80a09eb" from datastore.
[Thu Sep 17 17:20:05 2015] [error] Determined headers and types: [{'type': u'timestamp', 'id': u'Date'}, {'type': u'text', 'id': u'SiteName'}, {'type': u'text', 'id': u'LocationName'}, {'type': u'text', 'id': u'LocationGroup'}, {'type': u'text', 'id': u'WeekDay'}, {'type': u'numeric', 'id': u'BRCYear'}, {'type': u'text', 'id': u'BRCQuarter'}, {'type': u'text', 'id': u'BRCMonth'}, {'type': u'numeric', 'id': u'BRCWeek'}, {'type': u'numeric', 'id': u'InCount'}, {'type': u'numeric', 'id': u'OutCount'}, {'type': u'numeric', 'id': u'TotalCount'}, {'type': u'text', 'id': u'BusinessInCount'}, {'type': u'text', 'id': u'BusinessOutCount'}, {'type': u'text', 'id': u'BusinessTotalCount'}, {'type': u'text', 'id': u'FactoredInCount'}, {'type': u'text', 'id': u'FactoredOutCount'}, {'type': u'text', 'id': u'FactoredTotalCount'}]
[Thu Sep 17 17:20:06 2015] [error] Saving chunk 0
[Thu Sep 17 17:20:06 2015] [error] Saving chunk 1
[Thu Sep 17 17:20:07 2015] [error] Saving chunk 2
[Thu Sep 17 17:20:07 2015] [error] Saving chunk 3
[Thu Sep 17 17:20:08 2015] [error] Saving chunk 4
[Thu Sep 17 17:20:08 2015] [error] Saving chunk 5
[Thu Sep 17 17:20:09 2015] [error] Saving chunk 6
[Thu Sep 17 17:20:09 2015] [error] Saving chunk 7
[Thu Sep 17 17:20:09 2015] [error] Fetching from: http://hidden-domain.com/dataset/6449d8f5-76e7-4aff-bfb1-46d46542a56c/resource/b9cde552-ff7a-41c3-b6e6-1f16a80a09eb/download/footfalldaily.csv
[Thu Sep 17 17:20:10 2015] [error] Saving chunk 8
[Thu Sep 17 17:20:10 2015] [error] Deleting "b9cde552-ff7a-41c3-b6e6-1f16a80a09eb" from datastore.
[Thu Sep 17 17:20:11 2015] [error] Determined headers and types: [{'type': u'timestamp', 'id': u'Date'}, {'type': u'text', 'id': u'SiteName'}, {'type': u'text', 'id': u'LocationName'}, {'type': u'text', 'id': u'LocationGroup'}, {'type': u'text', 'id': u'WeekDay'}, {'type': u'numeric', 'id': u'BRCYear'}, {'type': u'text', 'id': u'BRCQuarter'}, {'type': u'text', 'id': u'BRCMonth'}, {'type': u'numeric', 'id': u'BRCWeek'}, {'type': u'numeric', 'id': u'InCount'}, {'type': u'numeric', 'id': u'OutCount'}, {'type': u'numeric', 'id': u'TotalCount'}, {'type': u'text', 'id': u'BusinessInCount'}, {'type': u'text', 'id': u'BusinessOutCount'}, {'type': u'text', 'id': u'BusinessTotalCount'}, {'type': u'text', 'id': u'FactoredInCount'}, {'type': u'text', 'id': u'FactoredOutCount'}, {'type': u'text', 'id': u'FactoredTotalCount'}]
[Thu Sep 17 17:20:11 2015] [error] Saving chunk 0
[Thu Sep 17 17:20:11 2015] [error] Job "push_to_datastore (trigger: RunTriggerNow, run = True, next run at: None)" raised an exception
[Thu Sep 17 17:20:11 2015] [error] Traceback (most recent call last):
[Thu Sep 17 17:20:11 2015] [error] File "/usr/lib/ckan/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py", line 512, in _run_job
[Thu Sep 17 17:20:11 2015] [error] retval = job.func(*job.args, **job.kwargs)
[Thu Sep 17 17:20:11 2015] [error] File "/opt/ckan/ckan2.2.1/lib/datapusher/src/datapusher/datapusher/jobs.py", line 321, in push_to_datastore
[Thu Sep 17 17:20:11 2015] [error] records, api_key, ckan_url)
[Thu Sep 17 17:20:11 2015] [error] File "/opt/ckan/ckan2.2.1/lib/datapusher/src/datapusher/datapusher/jobs.py", line 150, in send_resource_to_datastore
[Thu Sep 17 17:20:11 2015] [error] check_response(r, url, 'CKAN DataStore')
[Thu Sep 17 17:20:11 2015] [error] File "/opt/ckan/ckan2.2.1/lib/datapusher/src/datapusher/datapusher/jobs.py", line 91, in check_response
[Thu Sep 17 17:20:11 2015] [error] resp=response.text[:200]))
[Thu Sep 17 17:20:11 2015] [error] JobError: CKAN DataStore bad response. Status code: 500 Internal Server Error. At: http://hidden-domain.com/api/3/action/datastore_create. Response:
[Thu Sep 17 17:20:11 2015] [error] <html>
[Thu Sep 17 17:20:11 2015] [error] <head>
[Thu Sep 17 17:20:11 2015] [error] <title>Server Error</title>
[Thu Sep 17 17:20:11 2015] [error]
[Thu Sep 17 17:20:11 2015] [error] </head>
[Thu Sep 17 17:20:11 2015] [error] <body>
[Thu Sep 17 17:20:11 2015] [error] <h1>Server Error</h1>
[Thu Sep 17 17:20:11 2015] [error] An internal server error occurred
[Thu Sep 17 17:20:11 2015] [error]
[Thu Sep 17 17:20:11 2015] [error] </body>
[Thu Sep 17 17:20:11 2015] [error] </html>
[Thu Sep 17 17:20:11 2015] [error] Saving chunk 1
[Thu Sep 17 17:20:11 2015] [error] Saving chunk 2
[Thu Sep 17 17:20:12 2015] [error] Fetching from: http://hidden-domain.com/dataset/6449d8f5-76e7-4aff-bfb1-46d46542a56c/resource/8b71c43f-7e55-4926-ab06-39037026ce1a/download/footfallhourly.csv
[Thu Sep 17 17:20:12 2015] [error] Saving chunk 3
[Thu Sep 17 17:20:12 2015] [error] Saving chunk 4
[Thu Sep 17 17:20:14 2015] [error] Saving chunk 5
[Thu Sep 17 17:20:14 2015] [error] Deleting "8b71c43f-7e55-4926-ab06-39037026ce1a" from datastore.
[Thu Sep 17 17:20:14 2015] [error] Determined headers and types: [{'type': u'numeric', 'id': u'Id'}, {'type': u'timestamp', 'id': u'Date'}, {'type': u'text', 'id': u'SiteName'}, {'type': u'text', 'id': u'LocationName'}, {'type': u'text', 'id': u'LocationGroup'}, {'type': u'text', 'id': u'WeekDay'}, {'type': u'numeric', 'id': u'BRCYear'}, {'type': u'text', 'id': u'BRCQuarter'}, {'type': u'text', 'id': u'BRCMonth'}, {'type': u'numeric', 'id': u'BRCWeek'}, {'type': u'numeric', 'id': u'InCount'}, {'type': u'text', 'id': u'OutCount'}, {'type': u'numeric', 'id': u'TotalCount'}, {'type': u'text', 'id': u'BusinessInCount'}, {'type': u'text', 'id': u'BusinessOutCount'}, {'type': u'text', 'id': u'BusinessTotalCount'}, {'type': u'text', 'id': u'FactoredInCount'}, {'type': u'text', 'id': u'FactoredOutCount'}, {'type': u'text', 'id': u'FactoredTotalCount'}]
[Thu Sep 17 17:20:14 2015] [error] Saving chunk 0
[Thu Sep 17 17:20:14 2015] [error] Saving chunk 6
[Thu Sep 17 17:20:14 2015] [error] Saving chunk 1
[Thu Sep 17 17:20:15 2015] [error] Saving chunk 7
[Thu Sep 17 17:20:15 2015] [error] Saving chunk 2
[Thu Sep 17 17:20:16 2015] [error] Saving chunk 8
[Thu Sep 17 17:20:16 2015] [error] Saving chunk 3
[Thu Sep 17 17:20:16 2015] [error] Saving chunk 9
[Thu Sep 17 17:20:16 2015] [error] Fetching from: http://hidden-domain.com/dataset/6449d8f5-76e7-4aff-bfb1-46d46542a56c/resource/8ae0a0b0-eee6-4062-8ef8-13f3177519ff/download/footfallweekly.csv
[Thu Sep 17 17:20:17 2015] [error] Saving chunk 10
[Thu Sep 17 17:20:17 2015] [error] Saving chunk 4
[Thu Sep 17 17:20:17 2015] [error] Deleting "8ae0a0b0-eee6-4062-8ef8-13f3177519ff" from datastore.
[Thu Sep 17 17:20:17 2015] [error] Determined headers and types: [{'type': u'text', 'id': u'SiteName'}, {'type': u'text', 'id': u'LocationName'}, {'type': u'text', 'id': u'LocationGroup'}, {'type': u'numeric', 'id': u'BRCYear'}, {'type': u'text', 'id': u'BRCQuarter'}, {'type': u'text', 'id': u'BRCMonth'}, {'type': u'numeric', 'id': u'BRCWeek'}, {'type': u'numeric', 'id': u'InCount'}, {'type': u'numeric', 'id': u'OutCount'}, {'type': u'numeric', 'id': u'TotalCount'}, {'type': u'text', 'id': u'BusinessInCount'}, {'type': u'text', 'id': u'BusinessOutCount'}, {'type': u'text', 'id': u'BusinessTotalCount'}, {'type': u'text', 'id': u'FactoredInCount'}, {'type': u'text', 'id': u'FactoredOutCount'}, {'type': u'text', 'id': u'FactoredTotalCount'}]
[Thu Sep 17 17:20:17 2015] [error] Saving chunk 0
[Thu Sep 17 17:20:18 2015] [error] Saving chunk 5
[Thu Sep 17 17:20:18 2015] [error] Saving chunk 1
[Thu Sep 17 17:20:18 2015] [error] Saving chunk 11
[Thu Sep 17 17:20:18 2015] [error] Saving chunk 2
[Thu Sep 17 17:20:19 2015] [error] Saving chunk 6
[Thu Sep 17 17:20:19 2015] [error] Saving chunk 12
[Thu Sep 17 17:20:19 2015] [error] Saving chunk 3
[Thu Sep 17 17:20:19 2015] [error] Successfully pushed 906 entries to "8ae0a0b0-eee6-4062-8ef8-13f3177519ff".
[Thu Sep 17 17:20:19 2015] [error] Saving chunk 7
[Thu Sep 17 17:20:19 2015] [error] Saving chunk 13
[Thu Sep 17 17:20:19 2015] [error] Fetching from: http://hidden-domain.com/dataset/6449d8f5-76e7-4aff-bfb1-46d46542a56c/resource/8b71c43f-7e55-4926-ab06-39037026ce1a/download/footfallhourly.csv
[Thu Sep 17 17:20:21 2015] [error] Saving chunk 14
[Thu Sep 17 17:20:21 2015] [error] Saving chunk 8
[Thu Sep 17 17:20:21 2015] [error] Deleting "8b71c43f-7e55-4926-ab06-39037026ce1a" from datastore.
[Thu Sep 17 17:20:21 2015] [error] Saving chunk 9
[Thu Sep 17 17:20:21 2015] [error] Determined headers and types: [{'type': u'numeric', 'id': u'Id'}, {'type': u'timestamp', 'id': u'Date'}, {'type': u'text', 'id': u'SiteName'}, {'type': u'text', 'id': u'LocationName'}, {'type': u'text', 'id': u'LocationGroup'}, {'type': u'text', 'id': u'WeekDay'}, {'type': u'numeric', 'id': u'BRCYear'}, {'type': u'text', 'id': u'BRCQuarter'}, {'type': u'text', 'id': u'BRCMonth'}, {'type': u'numeric', 'id': u'BRCWeek'}, {'type': u'numeric', 'id': u'InCount'}, {'type': u'text', 'id': u'OutCount'}, {'type': u'numeric', 'id': u'TotalCount'}, {'type': u'text', 'id': u'BusinessInCount'}, {'type': u'text', 'id': u'BusinessOutCount'}, {'type': u'text', 'id': u'BusinessTotalCount'}, {'type': u'text', 'id': u'FactoredInCount'}, {'type': u'text', 'id': u'FactoredOutCount'}, {'type': u'text', 'id': u'FactoredTotalCount'}]
[Thu Sep 17 17:20:21 2015] [error] Saving chunk 15
[Thu Sep 17 17:20:21 2015] [error] Saving chunk 0
[Thu Sep 17 17:20:22 2015] [error] Saving chunk 16
[Thu Sep 17 17:20:22 2015] [error] Saving chunk 10
[Thu Sep 17 17:20:22 2015] [error] Job "push_to_datastore (trigger: RunTriggerNow, run = True, next run at: None)" raised an exception
[Thu Sep 17 17:20:22 2015] [error] Traceback (most recent call last):
[Thu Sep 17 17:20:22 2015] [error] File "/usr/lib/ckan/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py", line 512, in _run_job
[Thu Sep 17 17:20:22 2015] [error] retval = job.func(*job.args, **job.kwargs)
[Thu Sep 17 17:20:22 2015] [error] File "/opt/ckan/ckan2.2.1/lib/datapusher/src/datapusher/datapusher/jobs.py", line 321, in push_to_datastore
[Thu Sep 17 17:20:22 2015] [error] records, api_key, ckan_url)
[Thu Sep 17 17:20:22 2015] [error] File "/opt/ckan/ckan2.2.1/lib/datapusher/src/datapusher/datapusher/jobs.py", line 150, in send_resource_to_datastore
[Thu Sep 17 17:20:22 2015] [error] check_response(r, url, 'CKAN DataStore')
[Thu Sep 17 17:20:22 2015] [error] File "/opt/ckan/ckan2.2.1/lib/datapusher/src/datapusher/datapusher/jobs.py", line 84, in check_response
[Thu Sep 17 17:20:22 2015] [error] resp=pprint.pformat(json_response)))
[Thu Sep 17 17:20:22 2015] [error] JobError: CKAN DataStore bad response. Status code: 409 Conflict. At: http://hidden-domain.com/api/3/action/datastore_create. Response: {u'error': {u'__type': u'Validation Error',
[Thu Sep 17 17:20:22 2015] [error] u'constraints': [u'Cannot insert records or create index because of uniqueness constraint'],
[Thu Sep 17 17:20:22 2015] [error] u'info': {u'orig': u'duplicate key value violates unique constraint "pg_type_typname_nsp_index"\\nDETAIL: Key (typname, typnamespace)=(8b71c43f-7e55-4926-ab06-39037026ce1a__id_seq, 2200) already exists.\\n',
[Thu Sep 17 17:20:22 2015] [error] u'pgcode': u'23505'}},
[Thu Sep 17 17:20:22 2015] [error] u'help': u'Adds a new table to the DataStore.\\n\\n The datastore_create action allows you to post JSON data to be\\n stored against a resource. This endpoint also supports altering tables,\\n aliases and indexes and bulk insertion. This endpoint can be called multiple\\n times to initially insert more data, add fields, change the aliases or indexes\\n as well as the primary keys.\\n\\n To create an empty datastore resource and a CKAN resource at the same time,\\n provide ``resource`` with a valid ``package_id`` and omit the ``resource_id``.\\n\\n If you want to create a datastore resource from the content of a file,\\n provide ``resource`` with a valid ``url``.\\n\\n See :ref:`fields` and :ref:`records` for details on how to lay out records.\\n\\n :param resource_id: resource id that the data is going to be stored against.\\n :type resource_id: string\\n :param force: set to True to edit a read-only resource\\n :type force: bool (optional, default: False)\\n :param resource: resource dictionary that is passed to\\n :meth:`~ckan.logic.action.create.resource_create`.\\n Use instead of ``resource_id`` (optional)\\n :type resource: dictionary\\n :param aliases: names for read only aliases of the resource. (optional)\\n :type aliases: list or comma separated string\\n :param fields: fields/columns and their extra metadata. (optional)\\n :type fields: list of dictionaries\\n :param records: the data, eg: [{"dob": "2005", "some_stuff": ["a", "b"]}] (optional)\\n :type records: list of dictionaries\\n :param primary_key: fields that represent a unique key (optional)\\n :type primary_key: list or comma separated string\\n :param indexes: indexes on table (optional)\\n :type indexes: list or comma separated string\\n\\n Please note that setting the ``aliases``, ``indexes`` or ``primary_key`` replaces the exising\\n aliases or constraints. Setting ``records`` appends the provided records to the resource.\\n\\n **Results:**\\n\\n :returns: The newly created data object.\\n :rtype: dictionary\\n\\n See :ref:`fields` and :ref:`records` for details on how to lay out records.\\n\\n ',
[Thu Sep 17 17:20:22 2015] [error] u'success': False}
[Thu Sep 17 17:20:22 2015] [error] Saving chunk 17
[Thu Sep 17 17:20:22 2015] [error] Saving chunk 11
[Thu Sep 17 17:20:23 2015] [error] Saving chunk 18
[Thu Sep 17 17:20:23 2015] [error] Saving chunk 12
[Thu Sep 17 17:20:23 2015] [error] Saving chunk 19
[Thu Sep 17 17:20:23 2015] [error] Saving chunk 13
[Thu Sep 17 17:20:24 2015] [error] Saving chunk 14
[Thu Sep 17 17:20:24 2015] [error] Saving chunk 20
[Thu Sep 17 17:20:24 2015] [error] Saving chunk 15
[Thu Sep 17 17:20:24 2015] [error] Saving chunk 21
[Thu Sep 17 17:20:25 2015] [error] Saving chunk 16
[Thu Sep 17 17:20:25 2015] [error] Saving chunk 22
[Thu Sep 17 17:20:26 2015] [error] Saving chunk 17
[Thu Sep 17 17:20:26 2015] [error] Saving chunk 23
[Thu Sep 17 17:20:26 2015] [error] Saving chunk 18
[Thu Sep 17 17:20:26 2015] [error] Saving chunk 24
[Thu Sep 17 17:20:26 2015] [error] Saving chunk 19
[Thu Sep 17 17:20:26 2015] [error] Saving chunk 25
[Thu Sep 17 17:20:27 2015] [error] Successfully pushed 6329 entries to "b9cde552-ff7a-41c3-b6e6-1f16a80a09eb".
[Thu Sep 17 17:20:27 2015] [error] Saving chunk 20
.... continues
[Thu Sep 17 17:24:29 2015] [error] Saving chunk 607
[Thu Sep 17 17:24:30 2015] [error] Successfully pushed 151896 entries to "8b71c43f-7e55-4926-ab06-39037026ce1a".