跟踪下面。
相关的Python代码段:
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
最终触发(来自ssl库):
OverflowError:字符串长于2147483647字节
我假设有一些特殊的配置选项我不见了?
这可能与这个~1.5岁的显然尚未解决的问题有关:https://github.com/googledatalab/datalab/issues/784。
帮助表示赞赏!
完整追踪:
[文件“/usr/src/app/gcloud/download_data.py”,第109行,在******* blob.upload_from_filename(SOURCE_PATH)
在upload_from_filename中输入文件“/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py”,第992行 大小= TOTAL_BYTES)
在upload_from_file中输入文件“/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py”,第946行 client,file_obj,content_type,size,num_retries)
文件“/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py”,第867行,在_do_upload中 客户端,流,content_type,size,num_retries)
文件“/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py”,第700行,在_do_multipart_upload中 transport,data,object_metadata,content_type)
文件“/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py”,第97行,传输 retry_strategy = self._retry_strategy)
文件“/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py”,第101行,在http_request中 func,RequestsMixin._get_status_code,retry_strategy)
文件“/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py”,第146行,在wait_and_retry中 response = func()
文件“/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py”,第186行,请求中 method,url,data = data,headers = request_headers,** kwargs)
文件“/usr/local/lib/python3.5/dist-packages/requests/sessions.py”,第508行,请求中 resp = self.send(prep,** send_kwargs)
文件“/usr/local/lib/python3.5/dist-packages/requests/sessions.py”,第618行,发送 r = adapter.send(request,** kwargs)
文件“/usr/local/lib/python3.5/dist-packages/requests/adapters.py”,第440行,发送 超时=超时
文件“/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py”,第601行,在urlopen中 分块的分块=)
文件“/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py”,第357行,在_make_request中 conn.request(方法,网址,** httplib_request_kw)
文件“/usr/lib/python3.5/http/client.py”,第1106行,请求中 self._send_request(方法,网址,正文,标题)
文件“/usr/lib/python3.5/http/client.py”,第1151行,在_send_request中 self.endheaders(主体)
文件“/usr/lib/python3.5/http/client.py”,第1102行,在endheaders中 self._send_output(MESSAGE_BODY)
文件“/usr/lib/python3.5/http/client.py”,第936行,在_send_output中 self.send(MESSAGE_BODY)
文件“/usr/lib/python3.5/http/client.py”,第908行,发送 self.sock.sendall(数据)
文件“/usr/lib/python3.5/ssl.py”,第891行,在sendall中 v = self.send(data [count:])
文件“/usr/lib/python3.5/ssl.py”,第861行,发送 return self._sslobj.write(data)
文件“/usr/lib/python3.5/ssl.py”,第586行,写入 return self._sslobj.write(data)
OverflowError:字符串长于2147483647字节
答案 0 :(得分:5)
问题是它正在尝试将整个文件读入memory。在upload_from_filename
链后面显示它stats
该文件,然后将其作为上传大小传递为单个上传部分。
相反,在创建对象时指定chunk_size
会触发它在多个部分上传:
# Must be a multiple of 256KB per docstring
CHUNK_SIZE = 10485760 # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
快乐黑客!