在大文件上使用rsync时出现gsutil int错误

时间:2015-03-04 22:36:43

标签: google-cloud-storage gsutil

环境:

  • Windows 2012 R2服务器[服务器]
  • Python:2.7.9
  • GSutil:4.9
  • 在提升的命令提示符下以系统身份运行(对所有文件的完全访问权限)
  • Bucket也被命名为[server]

背景:尝试使用gsutil将〜5TB的数据备份到GCS。

执行:从以下命令开始:

python d:\gsutil\gsutil -m rsync -R d:\data\ gs://[server]

除了482个大文件外,大多数数据都被复制了。尝试:

python d:\gsutil\gsutil rsync -R d:\data\ gs://[server]

...并且先前未能复制的第一个文件的同步失败。 如下:

python d:\gsutil\gsutil -d rsync -R d:\data\ gs://[server]

收到以下内容:

Copying file://d:\data\CDC-Exp-Mar-1-2015\CDC-148_170_sample_6\CDC-148_170_sample_6_trimQ20_filter50.blast [Content-Type=application/octet-stream]...
==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this you and any
users that download such composite files will need to have a compiled
crcmod installed (see "gsutil help crcmod").

DEBUG 0304 10:01:53.209000 oauth2_client.py] GetAccessToken: checking cache for key [key]
DEBUG 0304 10:01:53.209000 oauth2_client.py] FileSystemTokenCache.GetToken: key=[key] present (cache_file=c:\windows\temp\oauth2_client-tokencache._.[key])
DEBUG 0304 10:01:53.209000 oauth2_client.py] GetAccessToken: token from cache: AccessToken(token=[token], expiry=2015-03-04
18:00:44.617000Z)
INFO 0304 10:01:53.224000 base_api.py] Calling method storage.objects.insert with StorageObjectsInsertRequest: <StorageObjectsInsertRequest
 bucket: u'[server]'
 object: <Object
 acl: []
 bucket: u'[server]'
 contentLanguage: 'en'
 contentType: 'application/octet-stream'
 name: u'CDC-Exp-Mar-1-2015/CDC-148_170_sample_6/CDC-148_170_sample_6_trimQ20_filter50.blast'>>
INFO 0304 10:01:53.224000 base_api.py] Making http POST to https://www.googleapis.com/resumable/upload/storage/v1/b/[server]/o?fields=generation%2Ccrc32c%2Cmd5Hash%2Csize&alt=json&prettyPrint=True
&uploadType=resumable
INFO 0304 10:01:53.240000 base_api.py] Headers: {'X-Upload-Content-Length': '144853423157',
 'X-Upload-Content-Type': 'application/octet-stream',
 'accept': 'application/json',
 'accept-encoding': 'gzip, deflate',
 'content-length': '189',
 'content-type': 'application/json',
 'user-agent': 'apitools gsutil/4.9 (win32)'}
INFO 0304 10:01:53.240000 base_api.py] Body:
{"bucket": "[server]", "contentType": "application/octet-stream", "name": "CDC-Exp-Mar-1-2015/CDC-148_170_sample_6/CDC-148_170_sample_6_trimQ20_filter50.blast", "contentLanguage": "en"}
connect: (www.googleapis.com, 443)
send: 'POST /resumable/upload/storage/v1/b/[server]/o?fields=generation%2Ccrc32c%2Cmd5Hash%2Csize&alt=json&prettyPrint=True&uploadType=resumable HTTP/1.1\r\nHost: www.googleapis.com\r\ncontent-len
gth: 189\r\naccept-encoding: gzip, deflate\r\naccept: application/json\r\nuser-agent: apitools gsutil/4.9 (win32)\r\nx-upload-content-length: 144853423157\r\nx-upload-content-type: application/octet-s
tream\r\ncontent-type: application/json\r\nauthorization: Bearer [token]\r\n\r\n{"bucket": "[server]", "contentType": "a
pplication/octet-stream", "name": "CDC-Exp-Mar-1-2015/CDC-148_170_sample_6/CDC-148_170_sample_6_trimQ20_filter50.blast", "contentLanguage": "en"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Location: https://www.googleapis.com/resumable/upload/storage/v1/b/[server]/o?fields=generation%2Ccrc32c%2Cmd5Hash%2Csize&alt=json&prettyPrint=True&uploadType=resumable&upload_id=AEnB2UqXH
kYq0s8RJk87LK8Bx-sHU60uRvytO8NBnV-dFQAEo1uBPm-bDlGnnGqpx4hMyaa5qgQtMMq0kXWL_ezfo6G1jMyGKw
header: Vary: Origin
header: Vary: X-Origin
header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate
header: Pragma: no-cache
header: Expires: Fri, 01 Jan 1990 00:00:00 GMT
header: Date: Wed, 04 Mar 2015 17:01:53 GMT
header: Content-Length: 0
header: Server: UploadServer ("Built on Feb 18 2015 18:10:26 (1424311826)")
header: Content-Type: text/html; charset=UTF-8
header: Alternate-Protocol: 443:quic,p=0.08
connect: (www.googleapis.com, 443)
send: 'POST /resumable/upload/storage/v1/b/[server]/o?fields=generation%2Ccrc32c%2Cmd5Hash%2Csize&alt=json&prettyPrint=True&uploadType=resumable HTTP/1.1\r\nHost: www.googleapis.com\r\ncontent-len
gth: 189\r\naccept-encoding: gzip, deflate\r\naccept: application/json\r\nuser-agent: apitools gsutil/4.9 (win32)\r\nx-upload-content-length: 144853423157\r\nx-upload-content-type: application/octet-s
tream\r\ncontent-type: application/json\r\nauthorization: Bearer [token]\r\n\r\n{"bucket": "[server]", "contentType": "a
pplication/octet-stream", "name": "CDC-Exp-Mar-1-2015/CDC-148_170_sample_6/CDC-148_170_sample_6_trimQ20_filter50.blast", "contentLanguage": "en"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Location: https://www.googleapis.com/resumable/upload/storage/v1/b/[server]/o?fields=generation%2Ccrc32c%2Cmd5Hash%2Csize&alt=json&prettyPrint=True&uploadType=resumable&upload_id=AEnB2Urlx
0WvbB5z9k9uvC9Qv4DeW4cCFLfn559_20nZKChCqSukmPYZcmZm7a_kwCrqubbRqF2an1HOv_lrMcPkdfpDinluQg
header: Vary: Origin
header: Vary: X-Origin
header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate
header: Pragma: no-cache
header: Expires: Fri, 01 Jan 1990 00:00:00 GMT
header: Date: Wed, 04 Mar 2015 17:01:53 GMT
header: Content-Length: 0
header: Server: UploadServer ("Built on Feb 18 2015 18:10:26 (1424311826)")
header: Content-Type: text/html; charset=UTF-8
header: Alternate-Protocol: 443:quic,p=0.08
INFO 0304 10:01:53.631000 base_api.py] Response of type Object: <Object
 acl: []>
DEBUG: Exception stack trace:
    Traceback (most recent call last):
      File "d:\gsutil\gslib\__main__.py", line 524, in _RunNamedCommandAndHandleExceptions
        debug_level, parallel_operations)
      File "d:\gsutil\gslib\command_runner.py", line 272, in RunNamedCommand
        return_code = command_inst.RunCommand()
      File "d:\gsutil\gslib\commands\rsync.py", line 967, in RunCommand
        fail_on_error=True)
      File "d:\gsutil\gslib\command.py", line 1148, in Apply
        arg_checker, should_return_results, fail_on_error)
      File "d:\gsutil\gslib\command.py", line 1219, in _SequentialApply
        worker_thread.PerformTask(task, self)
      File "d:\gsutil\gslib\command.py", line 1654, in PerformTask
        results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
      File "d:\gsutil\gslib\commands\rsync.py", line 866, in _RsyncFunc
        headers=cls.headers)
      File "d:\gsutil\gslib\copy_helper.py", line 2360, in PerformCopy
        allow_splitting=allow_splitting)
      File "d:\gsutil\gslib\copy_helper.py", line 1695, in _UploadFileToObject
        dst_obj_metadata, preconditions, gsutil_api, logger)
      File "d:\gsutil\gslib\copy_helper.py", line 1539, in _UploadFileToObjectResumable
        progress_callback=progress_callback)
      File "d:\gsutil\gslib\cloud_api_delegator.py", line 248, in UploadObjectResumable
        tracker_callback=tracker_callback, progress_callback=progress_callback)
      File "d:\gsutil\gslib\gcs_json_api.py", line 956, in UploadObjectResumable
        apitools_strategy=apitools_transfer.RESUMABLE_UPLOAD)
      File "d:\gsutil\gslib\gcs_json_api.py", line 804, in _UploadObject
        additional_headers, progress_callback)
      File "d:\gsutil\gslib\gcs_json_api.py", line 861, in _PerformResumableUpload
        additional_headers=addl_headers)
      File "d:\gsutil\gslib\third_party\storage_apitools\transfer.py", line 790, in StreamMedia
        additional_headers=additional_headers, use_chunks=False)
      File "d:\gsutil\gslib\third_party\storage_apitools\transfer.py", line 749, in __StreamMedia
        additional_headers=additional_headers)
      File "d:\gsutil\gslib\third_party\storage_apitools\transfer.py", line 826, in __SendMediaBody
        body=body_stream)
      File "d:\gsutil\gslib\third_party\storage_apitools\http_wrapper.py", line 103, in __init__
        self.body = body
      File "d:\gsutil\gslib\third_party\storage_apitools\http_wrapper.py", line 124, in body
        self.headers['content-length'] = str(len(self.__body))
    OverflowError: long int too large to convert to int

尝试将我们的.boto文件更改为状态rsync_buffer_lines = 64000,但这没有做任何事情。

感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

版本4.11结合将gsutil移动到C:\驱动器(如此处所述:gsutil doesn't work, when executed from drive D, on Windows #238)清除了我们看到的所有问题。谢谢!