这是我的代码。(python版本3.5)
log =os.path.join(sys.path[0],'log')
f=open(log,'r',encoding='utf-8')
s=f.read()
r=s.decode('utf-8')
此时我收到错误消息。
AttributeError: 'str' object has no attribute 'decode'
log
文件可能是这样的:
\/div>\n\t<\/div>\n\t<\/div>\n <!-- <div class=\"search_feedback\">\n <p>\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5<a href=\"javascript:void(0);\" suda-data=\"key=tblog_search_v4.1&value=weibo_suggest\" node-type=\"suggest\">\u53d1\u8868\u610f\u89c1<\/a>\u6216\u60a8\u53ef\u4ee5\u5173\u6ce8\u840c\u5c0f\u641c<a href=\"http:\/\/weibo.com\/wbsearch\" suda-data=\"key=tblog_search_v4.1&value=weibo_xiaosou\" title=\"\u6b22\u8fce\u8c03\u620f\u6700\u840c\u5b98\u535a\u5c4c\u4e1d~~\">@\u5fae\u535a\u641c\u7d22<\/a>\u83b7\u53d6\u641c\u7d22\u6280\u5de7\u3002<\/p>\n <\/div> -->\n<\/div>"})</script>
<script>STK && STK.pageletM && STK.pageletM.view({"pid":"pl_common_searchHistory","js":["apps\/search_v6\/js\/pl\/common\/searchHistory.js?version=20160324190000"],"css":["appstyle\/searchV45\/css_v6\/pl\/pl_history.css?version=20160324190000"],"html":""})</script>
实际上,它是HTML和UTF-8字符的组合。当我使用exec
时,我认为因为它包含大量'
和"
,解释器会出错{{1 }}
还有其他方法可以解决吗?
答案 0 :(得分:2)
将文件读作bytes/binary
,然后使用bytes.decode('unicode_escape')
:
>>> b'\\">\\n <p>\\u6b22\\u8fce\\u63d0\\u4ea4'.decode('unicode_escape')
'">\n <p>欢迎提交'
因此你可以这样做:
log = os.path.join(sys.path[0],'log')
with open(log, 'rb') as f:
s = f.read()
print(s.decode('unicode_escape'))
另外,如果你有一个字符串的完整Python repr,请说"\u8f6c\u53d1"
(与问题中的字符串不同),那么你可以使用ast.literal_eval()
:
>>> s = '"\\u8f6c\\u53d1"'
>>> print(s)
"\u8f6c\u53d1"
>>> import ast
>>> u = ast.literal_eval(s)
>>> print(u)
转发
答案 1 :(得分:0)
您可能会发现以下信息有用。
In [25]: s='this sentence with some UTF-8 characters\u8f6c\u53d1'.encode('utf-8')
In [26]: s.decode('utf-8')
Out[26]: 'this sentence with some UTF-8 characters转发'
In [34]: type('this sentence with some UTF-8 characters\u8f6c\u53d1')
Out[34]: builtins.str
In [35]: type('this sentence with some UTF-8 characters\u8f6c\u53d1'.encode('utf-8'))
Out[35]: builtins.bytes
In [36]: type('this sentence with some UTF-8 characters\u8f6c\u53d1'.encode('utf-8').decode('utf-8'))
Out[36]: builtins.str
我猜this sentence with some UTF-8 characters\u8f6c\u53d1
是一个包含unicode代码点的字符串(ascii在unicode中是相同的)
我不确定python是否包含72(无论A的unicode代码点是否为A等)。
答案 2 :(得分:0)
在程序的头部使用'#coding:utf8'。