我最近写了一个Python脚本,它将本地的,换行符分隔的JSON文件上传到BigQuery表。它与官方文档here中提供的示例非常相似。我遇到的问题是我正在尝试上传的文件中的非ASCII字符正在进行我的POST请求barf。
这是脚本的相关部分......
def upload(dataFilePath, loadJob, recipeJSON, logger):
body = '--xxx\n'
body += 'Content-Type: application/json; charset=UTF-8\n\n'
body += loadJob
body += '\n--xxx\n'
body += 'Content-Type: application/octet-stream\n\n'
dataFile = io.open(dataFilePath, 'r', encoding = 'utf-8')
body += dataFile.read()
dataFile.close()
body += '\n--xxx--\n'
credentials = buildCredentials(recipeJSON['keyPath'], recipeJSON['accountEmail'])
http = httplib2.Http()
http = credentials.authorize(http)
service = build('bigquery', 'v2', http=http)
projectId = recipeJSON['projectId']
url = BIGQUERY_URL_BASE + projectId + "/jobs"
headers = {'Content-Type': 'multipart/related; boundary=xxx'}
response, content = http.request(url, method="POST", body=body, headers=headers)
...这是我运行时得到的堆栈跟踪...
Traceback (most recent call last):
File "/usr/local/uploader/upload_data.py", line 179, in <module>
main(sys.argv)
File "/usr/local/uploader/upload_data.py", line 170, in main
if (upload(unprocessedFile, loadJob, recipeJSON, logger)):
File "/usr/local/uploader/upload_data.py", line 100, in upload
response, content = http.request(url, method="POST", body=body, headers=headers)
File "/usr/local/lib/python2.7/site-packages/oauth2client/util.py", line 128, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/oauth2client/client.py", line 490, in new_request
redirections, connection_type)
File "/usr/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1570, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1317, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1253, in _conn_request
conn.request(method, request_uri, body, headers)
File "/usr/local/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py", line 833, in _send_output
self.send(message_body)
File "/usr/local/lib/python2.7/httplib.py", line 805, in send
self.sock.sendall(data)
File "/usr/local/lib/python2.7/ssl.py", line 229, in sendall
v = self.send(data[count:])
File "/usr/local/lib/python2.7/ssl.py", line 198, in send
v = self._sslobj.write(data)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 4586-4611: ordinal not in range(128)
我正在使用Python 2.7和以下库: 分发(0.6.36) google-api-python-client(1.1) httplib2(0.8) oauth2client(1.1) pyOpenSSL(0.13) python-gflags(2.0) wsgiref(0.1.2)
还有其他人有这个问题吗?
似乎httplib2的请求方法将“body”作为字符串,这意味着它稍后需要在通过线路发送之前进行编码。我一直在寻找一种方法来将编码重写为UTF-8,但到目前为止还没有运气。
提前致谢!
编辑:
我能够通过做两件事来解决这个问题: 1.)在没有解码的情况下读取我的文件原始内容。 (我本来可以在上面的第一次尝试中编码“身体”......) 2.)将URL和头文件编码为字节。
代码最终看起来像这样:
def upload(dataFilePath, loadJob, recipeJSON, logger):
part_one = '--xxx\n'
part_one += 'Content-Type: application/json; charset=UTF-8\n\n'
part_one += loadJob
part_one += '\n--xxx\n'
part_one += 'Content-Type: application/octet-stream\n\n'
dataFile = io.open(dataFilePath, 'rb')
part_two = dataFile.read()
dataFile.close()
part_three = '\n--xxx--\n'
body = part_one.encode('utf-8')
body += part_two
body += part_three.encode('utf-8')
credentials = buildCredentials(recipeJSON['keyPath'], recipeJSON['accountEmail'])
http = httplib2.Http()
http = credentials.authorize(http)
service = build('bigquery', 'v2', http=http)
projectId = recipeJSON['projectId']
url = BIGQUERY_URL_BASE + projectId + "/jobs"
headers = {'Content-Type'.encode('utf-8'): 'multipart/related; boundary=xxx'.encode('utf-8')}
response, content = http.request(url.encode('utf-8'), method="POST", body=body, headers=headers)
答案 0 :(得分:1)
io.open()
会将文件作为 unicode 文本打开。使用普通open()
或使用二进制模式:
dataFile = io.open(dataFilePath, 'rb')
您正在通过网络直接发送文件内容,因此您需要发送字节,而不是unicode,并且正如您所知,混合Unicode和字节会导致痛苦的错误,因为python尝试使用以下内容自动编码回字节连接两种不同类型时的ASCII编解码器。此处无需解码为的所有 。