我收到此错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 4: ordinal not in range(128)
我尝试设置了许多不同的编解码器(在标题中,如# -*- coding: utf8 -*-
),甚至使用u“string”,但它仍然出现。
我该如何解决这个问题?
编辑:我不知道造成这种情况的实际字符,但由于这是一个以递归方式浏览文件夹的程序,因此它必须找到一个名字中带有奇怪字符的文件
代码:
# -*- coding: utf8 -*-
# by TerabyteST
###########################
# Explores given path recursively
# and finds file which size is bigger than the set treshold
import sys
import os
class Explore():
def __init__(self):
self._filelist = []
def exploreRec(self, folder, treshold):
print folder
generator = os.walk(folder + "/")
try:
content = generator.next()
except:
return
folders = content[1]
files = content[2]
for n in folders:
if "$" in n:
folders.remove(n)
for f in folders:
self.exploreRec(u"%s/%s"%(folder, f), treshold)
for f in files:
try:
rawsize = os.path.getsize(u"%s/%s"%(folder, f))
except:
print "Error reading file %s"%u"%s/%s"%(folder, f)
continue
mbsize = rawsize / (1024 * 1024.0)
if mbsize >= treshold:
print "File %s is %d MBs!"%(u"%s/%s"%(folder, f), mbsize)
错误:
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
a.exploreRec("C:", 100)
File "D:/Python/Explorator/shitfinder.py", line 35, in exploreRec
print "Error reading file %s"%u"%s/%s"%(folder, f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 4: ordinal not in range(128)
以下是使用print repr("Error reading file %s"%u"%s/%s"%(folder.decode('utf-8','ignore'), f.decode('utf-8','ignore')))
>>> a = Explore()
>>> a.exploreRec("C:", 100)
File C:/Program Files/Ableton/Live 8.0.4/Resources/DefaultPackages/Live8Library_v8.2.alp is 258 MBs!
File C:/Program Files/Adobe/Reader 9.0/Setup Files/{AC76BA86-7AD7-1040-7B44-A90000000001}/Data1.cab is 114 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/art/Art1.bar is 393 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/art/art2.bar is 396 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/art/art3.bar is 228 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/Sound/Sound.bar is 273 MBs!
File C:/ProgramData/Microsoft/Search/Data/Applications/Windows/Windows.edb is 162 MBs!
REPR:
u"Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/0/Sito web di Mirror's Edge.lnk"
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/0/Sito web di Mirror's Edge.lnk
REPR:
u"Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/1/Contenuti scaricabili di Mirror's Edge.lnk"
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/1/Contenuti scaricabili di Mirror's Edge.lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Google Talk/Supporto/Modalitiagnostica di Google Talk.lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Google Talk/Supporto/Modalitiagnostica di Google Talk.lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Microsoft SQL Server 2008/Strumenti di configurazione/Segnalazione errori e utilizzo funzionaliti SQL Server.lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Microsoft SQL Server 2008/Strumenti di configurazione/Segnalazione errori e utilizzo funzionaliti SQL Server.lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox/Mozilla Firefox ( Modalitrovvisoria).lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox/Mozilla Firefox ( Modalitrovvisoria).lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox 3.6 Beta 1/Mozilla Firefox 3.6 Beta 1 ( Modalitrovvisoria).lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox 3.6 Beta 1/Mozilla Firefox 3.6 Beta 1 ( Modalitrovvisoria).lnk
Traceback (most recent call last):
File "<pyshell#21>", line 1, in <module>
a.exploreRec("C:", 100)
File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x99 in position 78: ordinal not in range(128)
>>>
答案 0 :(得分:14)
答案 1 :(得分:6)
Python默认使用ASCII编码,这很烦人。如果您想永久更改它,请查找并修改 site.py 文件,搜索def setencoding()
并在下面几行中将encoding = "ascii"
更改为encoding = "utf-8"
。再见,再见默认的ASCII编码。
答案 2 :(得分:2)
您正在尝试对包含非ASCII字符的unicode字符串执行某些操作(例如,打印),并且默认情况下该字符串将转换为ascii。您需要指定编码才能正确表示字符串 如果你发布一些你想要做的样本代码,那将会有很大的帮助。
最简单的方法是:
s = u'ma\xf1ana';
print s.encode('latin-1');
在添加到问题的详细信息后编辑:
在您的情况下,您需要解码您首先阅读的字符串:
f.decode();
,
所以尝试改变
u"%s/%s" % (folder, f)
到
os.path.join(folder, f.decode())
注意,可能需要使用'latin-1'编码更改为使用
命名文件的内容 PS:John Machin提到了改进和清理代码的非常有用的方法。 1答案 3 :(得分:1)
您是否在Windows cmd.exe框中运行此程序?如果是这样,请尝试在IDLE中运行它,看看是否得到相同的错误。 Cmd.exe框不执行unicode,只执行ascii。
答案 4 :(得分:1)
一些unicode项目:
# encoding: utf-8
放在文件顶部有时会有帮助(如果您的编辑使用UTF-8保存文件......)s = "i'm a string"
u = u"i'm unicode, at least in python < ۳"
进一步阅读:
答案 5 :(得分:1)
u"%s" % f
在各个地方,你正在做与上述代码类似的事情。这是将str对象转换为unicode对象的错误方法,因为转换是使用sys.getdefaultencoding()(ascii)完成的,这几乎可以保证是错误的。
您应该使用编码/解码方法转换为unicode对象/从unicode对象转换。这需要知道输入的编码(从os.walk返回的字符串)是什么。例如,如果文件名以UTF-8编码
uf = f.decode('utf-8')
将f解释为UTF-8编码的字节序列,并返回正确的unicode对象。类似地,当您需要输出unicode对象时,您会将其转换回str,指定要将其输出的有效编码。
print uf.encode('utf-8')
答案 6 :(得分:0)
我遇到了一些与编码不一致的代码库的不幸。
这是我们用来帮助清理它的功能:
def to_unicode(value):
if isinstance(value, unicode):
return value
elif isinstance(value, str):
try:
if value.startswith('\xff\xfe'):
return value.decode('utf-16-le')
elif value.startswith('\xfe\xff'):
return value.decode('utf-16-be')
else:
return value.decode('utf-8')
except UnicodeDecodeError:
return value.decode('latin-1')
else:
try:
return unicode(value)
except UnicodeError:
return to_unicode(str(value))
except TypeError:
if hasattr(value, '__unicode__'):
return value.__unicode__()
因此,您可以使用该功能:
print u"Error reading file %s/%s" % (to_unicode(folder), to_unicode(f))
答案 7 :(得分:-1)
而不是:
print "Error reading file %s"%u"%s/%s"%(folder, f)
试试这个:
print "Error reading file %s"%u"%s/%s"%(folder.encode('ascii','ignore'), f.encode('ascii','ignore'))
由于控制台无法打印unicode字符,因此您可以看到正确的名称。 'ignore'告诉编解码器跳过这些字符。你也可以使用'replace'(打印'?'),'xmlcharrefreplace'(替换为代码点的&amp; x ####),'backslashreplace'(替换为代码的\ x ######)
您需要对您打印的每个unicode字符串进行编码。