我正在尝试解析API响应但是响应数据包含给python带来麻烦的字符。
API响应:electricity price | 19.52¢/kW·h (January 1, 2014) natural gas price | $11.05 per thousand cubic feet (January 15, 2014) heating oil price | $4.338/gal (March 17, 2014) propane price | $3.968/gal (March 17, 2014)
错误以“每千瓦时的分数”字符提出。
完全错误:UnicodeEncodeError: 'ascii' codec can't encode character u'\xa2' in position 25: ordinal not in range(128)
终端中显示的响应:electricity price | 19.52\xa2/kW\xb7h (January 1, 2014)\nnatural gas price | $11.05 per thousand cubic feet (January 15, 2014)\nheating oil price | $4.338/gal (March 17, 2014)\npropane price | $3.968/gal (March 17, 2014)
如何解析这些问题字符周围的数据?我不需要全文,只需要其中的数值。谢谢你的帮助。
编辑:
导致错误的代码:
search('electricity price | {:d}', energy)
我也尝试过:
search('electricity price | {:f}', energy)
其结果相似。 energy
是一个存储上面列出的完整字符串的变量。
编辑2:
完整代码包括API调用:
client = wolframalpha.Client('apikey')
energy_query = 'utilities prices in ' + city + ' ' + state_abbr
res = client.query(energy_query)
energy = (next(res.results).text)
search('electricity price | {:d}', energy)
完整追溯:
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site- packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site- packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/nulife.py", line 120, in index
search('electricity price | {:d}', energy)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site- packages/parse.py", line 1041, in search
return Parser(format, extra_types=extra_types).search(string, pos, endpos)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site- packages/parse.py", line 678, in search
return self._generate_result(m)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site- packages/parse.py", line 699, in _generate_result
fixed_fields[n] = self._type_conversions[n](fixed_fields[n], m)
File "/Users/aaronpardes/Dropbox/Python/nuLife2/newlifenv/lib/python2.7/site-packages/parse.py", line 375, in f
if string[0] == '-':
TypeError: 'NoneType' object has no attribute '__getitem__'
答案 0 :(得分:4)
energy
已经一个Unicode对象;尝试在其上调用.decode()
会首先触发隐含编码(使用ASCII,默认编解码器):
>>> energy = u'19.52¢/kW·h'
>>> energy.decode('windows-1252')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa2' in position 5: ordinal not in range(128)
注意例外情况;解码Unicode字符串会触发UnicodeEncodeError
。
这是设计的,wolframalpha
library使用ElementTree来解析XML响应,它总是为您提供Unicode对象。
更新后,我查看parse
library source code我担心您在代码中发现了错误;他们不会在您提交的文字字符串中转义正则表达式元字符。如果您转义它所使用的|
字符:
>>> search('electricity price \\| {:f}', u'electricity price | 19.52¢/kW·h')
<Result (19.52,) {}>
我已经与parse
项目开了一个bug report。
请注意,库可能仅限于解析ASCII文本;不要尝试将¢/kW·h
作为单词字符进行匹配,至少。
更新:已发布parse
version 1.6.4修复此特定错误。