尝试通过bulkuploader上传压缩数据(unicode)

时间:2014-10-06 16:12:11

标签: google-app-engine python-2.7 zlib bulkloader google-cloud-datastore

我遇到了上传到db.text的数据超过1 mb的问题,所以我使用zlib压缩了这些信息。 Bulkloader默认不支持上传的unicode数据数据,因此我将源代码切换为使用unicodecsv而不是python内置的csv模块。我遇到的问题是Google App Engine的bulkload无法支持unicode字符(即使db.Text实体是unicode)。

[ERROR   ] [Thread-12] DataSourceThread:
Traceback (most recent call last):
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1611, in run
    self.PerformWork()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1730, in PerformWork
    for item in content_gen.Batches():
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 542, in Batches
    self._ReadRows(key_start, key_end)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 452, in _ReadRows
    row = self.reader.next()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/csv_connector.py", line 219, in generate_import_record
    for input_dict in self.dict_generator:
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/unicodecsv/__init__.py", line 188, in next
    row = csv.DictReader.next(self)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
    row = self.reader.next()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/unicodecsv/__init__.py", line 106, in next
    row = self.reader.next()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/csv_connector.py", line 55, in utf8_recoder
    for line in codecs.getreader(encoding)(stream):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 612, in next
    line = self.readline()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 527, in readline
    data = self.read(readsize, firstline=True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 474, in read
    newchars, decodedbytes = self.decode(data, self.errors)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 29: invalid start byte

我知道,对于我的本地测试,我可以修改python文件以使用unicodecsv的模块,但这无助于解决在生产中使用GAE的数据存储区的问题。是否存在任何人都知道的问题的现有解决方案?

1 个答案:

答案 0 :(得分:0)

另一周解决了这个问题,你只需要对结果进行base64编码,这样你就不会有任何关于bulkloader大小增加30-50%的问题,但是因为zlib已经将我的数据压缩到了原来的10%,所以这不是'太糟糕了。