使用ckanapi和Python创建包含资源的CKAN包/数据集

时间:2018-01-01 23:16:17

标签: python python-3.x ckan

CKAN提供ckanapi包,用于通过Python或命令行访问the CKAN API

我可以使用它来下载元数据,创建资源等。但我无法在单个API调用中创建包并将资源上传到它。 (包也称为数据集。)

内部ckanapi scans all keys moving any file-like parameters into a separate dictpasses to the requests.session.post(files=..) parameter

这是我能得到的最接近但是CKAN返回HTTP 500错误(从this guide to requests复制):

with ckanapi.RemoteCKAN('http://myckan.example.com', apikey='real-key', user_agent=ua, username='joe', password='pwd') as ckan:
    ckan.action.package_create(name='joe_data',
                               resources=('report.xls',
                                          open('/path/to/file.xlsx', 'rb'),
                                          'application/vnd.ms-excel',
                                          {'Expires': '0'}))

我还尝试了resources=open('path/file')files=open('file'),更短或更长的元组,但得到了相同的500错误。

requests文档说:

:param files: (optional) Dictionary of ``'filename': file-like-objects``
    for multipart encoding upload.

我无法通过ckanapi resources={'filename': open('file')}因为ckanapi没有检测到该文件,尝试将其作为正常参数传递给requests,并且失败(" BufferedReader不是JSON可序列化的"因为它试图使文件成为POST参数)。如果我尝试传递文件列表,我会得到相同的。但是the API is able to创建了一个包并在一次调用中添加了许多资源。

那么如何通过一次ckanapi调用创建一个包和多个资源?

1 个答案:

答案 0 :(得分:0)

我对此感到很好奇,并以为我会做一些测试。不幸的是,我还没有使用您提到的CLI。但是我希望这会帮助您和其他人在此绊脚石。

我不是很肯定,但我猜想您的资源字典格式不正确。资源需要是字典列表。

这是一个用于执行单个api调用插入的ruby脚本(目前是我的首选语言):

# Ruby script to create a package and resource in one api call. 
# You can run this in https://repl.it/languages/ruby
# Don't forget to update URLs and API key.
require 'csv'
require 'json'
require 'net/http'

hash_to_json = {
                  "title" => 'test1',
                  "name" => 'test1',
                  "owner_org" => 'bbb9682e-b58c-4826-bf4b-b161581056be',
                  "resources" => [ 
                    {
                      "url" => 'http://www.resource_domain.com/doc.kml'
                    }
                  ]
                }.to_json

uri = URI('http://ckan_app_domain.com:5000/api/3/action/package_create')

Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Post.new uri

  request['Authorization'] = 'user-api-key'
  request.body = hash_to_json

  response = http.request request
  puts response.body
end

这是做相同事情的简单python脚本(感谢您为我修改的模板提供CKAN文档)

#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint

# Put the details of the dataset we're going to create into a dict.
dataset_dict = {
    'name': 'my_dataset_name',
    'notes': 'A long description of my dataset',
    'owner_org': 'bbb9682e-b58c-4826-bf4b-b161581056be',
    'resources': [
      {
        'url': 'example.com'
      }
    ]
}

# Use the json module to dump the dictionary to a string for posting.
data_string = urllib.quote(json.dumps(dataset_dict))

# We'll use the package_create function to create a new dataset.
request = urllib2.Request(
    'http://ckan_app_domain.com:5000/api/3/action/package_create')

# Creating a dataset requires an authorization header.
# Replace *** with your API key, from your user account on the CKAN site
# that you're creating the dataset on.
request.add_header('Authorization', 'user-api-key')

# Make the HTTP request.
response = urllib2.urlopen(request, data_string)
assert response.code == 200

# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.read())
assert response_dict['success'] is True

# package_create returns the created package as its result.
created_package = response_dict['result']
pprint.pprint(created_package)