如何解决将PDF文件作为资源上传到CKAN数据集失败,并显示“ {file}无法json序列化”
我创建数据集的简单python脚本,并向该数据集添加了一个PDF文件,因为资源失败并显示“ {file} is not json serializable”。
# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni
# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'
endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}
payload_p = {
"name": "test01","private": "true","state": "active","owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d","maintainer" : "Forms Management","maintainer_email" : "forms.management@province.ca","author" : "Test Author","author_email" : "hughj@province.ca"
}
payload_r = {
"package_id": "null","name": "English - test01 - Test Description","url": "upload","upload": open('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf','r'),"description": "This is a test resource attached to dataset test01","notes": "This is a longer block of text that is for the resource test01e which is attached to the dataset test01"
}
filepaths = {
"thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
req_p = requests.post(endpoint_p,json=payload_p,headers=headers)
theLastResponse = req_p.json()
theLastPackageCreated = theLastResponse['result']['id']
payload_r["package_id"] = theLastPackageCreated
req_r = requests.post(endpoint_r,json = payload_r,headers = headers) # resource_create()
这将引发错误“ {file}不可json序列化”。该文件是PDF,它是一个二进制文件,但是我不确定是否需要某种类型的编码(请注意注释掉的“ base64”模块...我不想走这条路而不问是否这是正确的方法。)
此处的CKAN API文档: https://docs.ckan.org/en/2.9/api/#ckan.logic.action.create.resource_create
表示“上传”应为“((FieldStorage(可选)需要多部分/表单数据)–(可选)”),但是我见过的所有将文件上传到CKAN的示例脚本都仅准确地显示了代码我在这里所做的事情,没有对上传的文件进行任何额外的预处理,或者没有什么预处理,所以我不确定到底是什么问题...如果可以的话,请帮忙!
解决方法
我复制了您的代码,并针对CKAN的本地开发副本运行了修改版本,并使其在我的mods包含以下内容后正常工作。
最值得注意的是:
- payload_r->不需要所有多余的东西,但是如果需要,您可以包括其他资源元数据,例如描述,名称等
- req_r-> 1)在此处将有效负载作为
data
而不是json
作为multipart-form-data
传递。 2)使用files
参数将文件发送到此处。
文档:https://docs.ckan.org/en/2.9/maintaining/filestore.html#filestore-api
IMO,这不是CKAN问题,而是更多地了解所选的库(即请求)。有许多方法可以使用不同的工具来完成此操作。
我还必须更新有效负载以与我的架构保持一致,但是假设这对您来说是正确的,那么这应该可行。
# coding=utf-8
# import base64
import ckanapi
import requests
import csv
import json
import pprint
import socket
import netifaces as ni
# UPDATE THESE AND ONLY THESE.
api_token = '***'
the_hostname = socket.gethostname()
the_ipaddress = ni.ifaddresses('eth0')[ni.AF_INET][0]['addr']
site_url = 'http://' + the_ipaddress + ':5000'
endpoint_p = '{}/api/3/action/package_create'.format(site_url)
endpoint_r = '{}/api/3/action/resource_create'.format(site_url)
headers = {'Authorization': api_token}
payload_p = {
"name": "test01","private": "true","state": "active","owner_org": "b15a6f45-e2ed-4587-8c5e-a92dbc9f157d","maintainer" : "Forms Management","maintainer_email" : "forms.management@province.ca","author" : "Test Author","author_email" : "hughj@province.ca"
}
payload_r = {
"package_id": "null"
}
filepaths = {
"thepath": "/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf"
}
req_p = requests.post(endpoint_p,json=payload_p,headers=headers)
theLastResponse = req_p.json()
theLastPackageCreated = theLastResponse['result']['id']
payload_r["package_id"] = theLastPackageCreated
req_r = requests.post(endpoint_r,data=payload_r,headers=headers,files=[('upload',file('/var/www/upload/2nd/unzipped/002-33-5098E/33-5098E.pdf'))]) # resource_create()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。