搜索api的python appengine unicodeencodeerror snippeted结果

时间:2013-03-19 22:15:57

标签: python google-app-engine google-search-api

我正在抓取页面并使用appengine search api(西班牙语和加泰罗尼亚语页面,带有重音字符)索引它们。我可以执行搜索并制作一页结果。

当我尝试将查询对象与snipetted_fields一起使用时出现问题,因为它总是生成UnicodeEncodeError:

  File "/home/otger/python/jobs-gae/src/apps/search/handlers/results.py", line 82, in find_documents
    return index.search(query_obj)
  File "/opt/google_appengine_1.7.6/google/appengine/api/search/search.py", line 2707, in search
    apiproxy_stub_map.MakeSyncCall('search', 'Search', request, response)
  File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
    return stubmap.MakeSyncCall(service, call, request, response)
  File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 320, in MakeSyncCall
    rpc.CheckSuccess()
  File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_rpc.py", line 156, in _WaitImpl
    self.request, self.response)
  File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 200, in MakeSyncCall
    self._MakeRealSyncCall(service, call, request, response)
  File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 234, in _MakeRealSyncCall
    raise pickle.loads(response_pb.exception())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 52: ordinal not in range(128)

我在stackoverflow上发现了一个类似的问题:GAE Full Text Search development console UnicodeEncodeError但它说这是1.7.0上修复的错误。我使用版本1.7.5和1.7.6得到相同的错误。

索引页面时,我添加了两个字段:description和description_ascii。如果我尝试为description_ascii生成片段,它可以完美地工作。

是否可以在dev_appserver上生成非ascii内容的片段?

1 个答案:

答案 0 :(得分:2)

我认为这是一个错误,报告了新的缺陷问题https://code.google.com/p/googleappengine/issues/detail?id=9335

开发服务器的临时解决方案 - 找到google.appengine.api.search模块(search.py​​),并通过添加内联来修补函数_DecodeUTF8,如下所示:

def _DecodeUTF8(pb_value):
  """Decodes a UTF-8 encoded string into unicode."""
  if pb_value is not None:
    return pb_value.decode('utf-8') if not isinstance(pb_value, unicode) else pb_value
  return None

解决方法 - 在问题解决之前自己实现代码段功能 - 假设代码段基础的字段名为snippet_base

query = search.Query(query_string=query_string,
                 options=
                    search.QueryOptions(
                        ...
                        returned_fields= [... 'snippet_base' ...]
                        ))
results = search.Index(name="<index-name>").search(query)
if results:
    for res in results.results:
        res.snippet = some_snippeting_function(res.field("snippet_base"))