Question

我是新手使用google colaboratory（colab）和pydrive。我正在尝试在＆＃39; CAS_num_strings＆＃39;中加载数据。它使用colab作为：

写在google驱动器上特定目录中的pickle文件中

pickle.dump(CAS_num_strings,open('CAS_num_strings.p', 'wb'))
dump_meta = {'title': 'CAS.pkl', 'parents': [{'id':'1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj'}]} 
pkl_dump = drive.CreateFile(dump_meta)
pkl_dump.SetContentFile('CAS_num_strings.p')
pkl_dump.Upload()
print(pkl_dump.get('id'))

其中＆＃39; id＆＃39; 1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj＆＃39;确保它有一个特定的父文件夹，其中包含此ID。最后一个打印命令给出了输出：

'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'

因此，我能够创建并转储其id为＆＃39; 1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH＆＃39;的pickle文件。现在，我想在另一个colab脚本中加载这个pickle文件用于不同的目的。为了加载，我使用命令集：

cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
print('Downloaded content "{}"'.format(cas_strings.GetContentString()))

这给了我输出：

title: CAS.pkl, mimeType: text/x-pascal

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-9-a80d9de0fecf> in <module>()
     30 cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
     31 print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
---> 32 print('Downloaded content "{}"'.format(cas_strings.GetContentString()))
     33 
     34 

/usr/local/lib/python3.6/dist-packages/pydrive/files.py in GetContentString(self, mimetype, encoding, remove_bom)
    192                     self.has_bom == remove_bom:
    193       self.FetchContent(mimetype, remove_bom)
--> 194     return self.content.getvalue().decode(encoding)
    195 
    196   def GetContentFile(self, filename, mimetype=None, remove_bom=False):

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

如您所见，它找到文件CAS.pkl但无法解码数据。我希望能够解决此错误。据我所知，正常的utf-8编码/解码在正常的pickle转储过程中运行平稳，并且使用＆＃39; wb＆＃39;和＆＃39; rb＆＃39;选项。但是在目前的情况下，在转储之后，我似乎无法从上一步创建的google驱动器中的pickle文件中加载它。错误存在于我无法指定如何解码数据的位置＆＃34;返回self.content.getvalue（）。decode（encoding）＆＃34;。我似乎无法从这里（https://developers.google.com/drive/v2/reference/files#resource-representations）找到要修改的关键字/元数据标签。任何帮助表示赞赏。感谢

Answer 1

问题是GetContentString仅在内容是有效的UTF-8字符串（docs）且您的pickle不是。{/ p>时才有效。

不幸的是，您必须做一些额外的工作，因为没有GetContentBytes - 您必须将内容保存到文件中并将其读回。这是一个有效的例子： https://colab.research.google.com/drive/1gmh21OrJL0Dv49z28soYq_YcqKEnaQ1X

Answer 2

实际上，我在朋友的帮助下找到了一个优雅的答案。我使用GetContentFile代替GetContentStrings，它是SetContentFile的对应物。这会将文件加载到当前工作空间中，我可以像任何pickle文件一样读取它。最后，数据很好地加载到cas_nums中。

cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
cas_strings.GetContentFile(cas_strings['title'])
cas_nums = pickle.load(open(cas_strings['title'],'rb'))

有关此内容的更多详细信息，请参阅下载文件内容部分中的pydrive文档 - http://pythonhosted.org/PyDrive/filemanagement.html#download-file-content

＆＃34; UnicodeDecodeError：＆＃39; utf-8＆＃39;编解码器不能解码字节0x80＆＃34;在google colaboratory上使用pydrive加载pickle文件

2 个答案: