我正在寻找更新链接资源时更新CKAN DataStore的最简单方法。在这种情况下,所有资源都被链接(没有上载)。资源是csv的资源,并定期更新。当对csv文件进行更新时,CKAN的DataStore似乎并没有自动地吸收这些更改。我尝试使用ckanapi,但是update_resource函数仅显示用于更新元数据。我一直无法获得它来持续更新DataStore(因此Data Explorer视图包含过时的信息)。
除非有一个更简单的方法,否则我的喜好是找到一种方法来以编程方式触发“上传到数据存储”按钮,该按钮可以在给定资源的“数据存储”选项卡上找到。我已经做了一些相当广泛的搜索,但还没有找到一种方法来做到这一点。任何建议表示赞赏。
当前CKAN的版本是CKAN 2.8.1,其中已启用DataStore和DataPusher扩展。
答案 0 :(得分:0)
您应该能够使用CKAN API(特别是datapusher_submit
(请参见下文)使用脚本来执行此操作。
这是我过去使用的示例python脚本。
还有一个PR open,可以帮助您更好地进行记录,但尚未合并。
#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint
# We'll use the package_search function to get all of the resources.
# NOTE: there may be a limit on this in the future and would have to then make multiple calls to collect
# all of the resources. Datasets has a hard-limit of 1000 but defaults to 10. So for now this works, but future issue maybe.
resources_request = urllib2.Request(
'http://ckan-site.com/api/3/action/resource_search?query=name:')
# Make the HTTP request.
resources_response = urllib2.urlopen(resources_request)
# Make sure it worked
assert resources_response.code == 200
# Use the json module to load CKAN's response into a dictionary.
resources_response_dict = json.loads(resources_response.read())
assert resources_response_dict['success'] is True
results = resources_response_dict['result']['results']
for result in results:
'''Loop over the resources and submit them to the datastore.
'''
try:
request = urllib2.Request('http://ckan-site.com/api/3/action/datapusher_submit')
data_dict = {
"resource_id":result['id']
}
data_string = urllib.quote(json.dumps(data_dict))
request.add_header('Authorization', 'your-token-here')
response = urllib2.urlopen(request, data_string)
assert json.loads(response.read())['success'] is True
except:
# Catch and print any issues and keep going.
print "resource_id: " + result['id']
continue
print "Complete. Datastore is now up to date."