Python,UnicodeDecodeError

时间:2009-11-19 21:22:54

标签: python unicode

我收到此错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 4: ordinal not in range(128)

我尝试设置了许多不同的编解码器(在标题中,如# -*- coding: utf8 -*-),甚至使用u“string”,但它仍然出现。

我该如何解决这个问题?

编辑:我不知道造成这种情况的实际字符,但由于这是一个以递归方式浏览文件夹的程序,因此它必须找到一个名字中带有奇怪字符的文件

代码:

# -*- coding: utf8 -*-


# by TerabyteST

###########################

# Explores given path recursively
# and finds file which size is bigger than the set treshold

import sys
import os

class Explore():
    def __init__(self):
        self._filelist = []

    def exploreRec(self, folder, treshold):
        print folder
        generator = os.walk(folder + "/")
        try:
            content = generator.next()
        except:
            return
        folders = content[1]
        files = content[2]
        for n in folders:
            if "$" in n:
                folders.remove(n)
        for f in folders:
            self.exploreRec(u"%s/%s"%(folder, f), treshold)
        for f in files:
            try:
                rawsize = os.path.getsize(u"%s/%s"%(folder, f))
            except:
                print "Error reading file %s"%u"%s/%s"%(folder, f)
                continue
            mbsize = rawsize / (1024 * 1024.0)
            if mbsize >= treshold:
                print "File %s is %d MBs!"%(u"%s/%s"%(folder, f), mbsize)

错误:

Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    a.exploreRec("C:", 100)
  File "D:/Python/Explorator/shitfinder.py", line 35, in exploreRec
    print "Error reading file %s"%u"%s/%s"%(folder, f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 4: ordinal not in range(128)

以下是使用print repr("Error reading file %s"%u"%s/%s"%(folder.decode('utf-8','ignore'), f.decode('utf-8','ignore')))

显示的内容
>>> a = Explore()
>>> a.exploreRec("C:", 100)
File C:/Program Files/Ableton/Live 8.0.4/Resources/DefaultPackages/Live8Library_v8.2.alp is 258 MBs!
File C:/Program Files/Adobe/Reader 9.0/Setup Files/{AC76BA86-7AD7-1040-7B44-A90000000001}/Data1.cab is 114 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/art/Art1.bar is 393 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/art/art2.bar is 396 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/art/art3.bar is 228 MBs!
File C:/Program Files/Microsoft Games/Age of Empires III/Sound/Sound.bar is 273 MBs!
File C:/ProgramData/Microsoft/Search/Data/Applications/Windows/Windows.edb is 162 MBs!
REPR:
u"Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/0/Sito web di Mirror's Edge.lnk"
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/0/Sito web di Mirror's Edge.lnk
REPR:
u"Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/1/Contenuti scaricabili di Mirror's Edge.lnk"
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/GameExplorer/{1B4801C1-CA86-487E-8347-B26F1CCB2F75}/SupportTasks/1/Contenuti scaricabili di Mirror's Edge.lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Google Talk/Supporto/Modalitiagnostica di Google Talk.lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Google Talk/Supporto/Modalitiagnostica di Google Talk.lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Microsoft SQL Server 2008/Strumenti di configurazione/Segnalazione errori e utilizzo funzionaliti SQL Server.lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Microsoft SQL Server 2008/Strumenti di configurazione/Segnalazione errori e utilizzo funzionaliti SQL Server.lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox/Mozilla Firefox ( Modalitrovvisoria).lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox/Mozilla Firefox ( Modalitrovvisoria).lnk
REPR:
u'Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox 3.6 Beta 1/Mozilla Firefox 3.6 Beta 1 ( Modalitrovvisoria).lnk'
END REPR:
Error reading file C:/ProgramData/Microsoft/Windows/Start Menu/Programs/Mozilla Firefox 3.6 Beta 1/Mozilla Firefox 3.6 Beta 1 ( Modalitrovvisoria).lnk

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    a.exploreRec("C:", 100)
  File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
    self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
  File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
    self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
  File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
    self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
  File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
    self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
  File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
    self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
  File "D:/Python/Explorator/shitfinder.py", line 30, in exploreRec
    self.exploreRec(("%s/%s"%(folder, f)).encode("utf-8"), treshold)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x99 in position 78: ordinal not in range(128)
>>> 

8 个答案:

答案 0 :(得分:14)

答案 1 :(得分:6)

Python默认使用ASCII编码,这很烦人。如果您想永久更改它,请查找并修改 site.py 文件,搜索def setencoding()并在下面几行中将encoding = "ascii"更改为encoding = "utf-8"。再见,再见默认的ASCII编码。

答案 2 :(得分:2)

您正在尝试对包含非ASCII字符的unicode字符串执行某些操作(例如,打印),并且默认情况下该字符串将转换为ascii。您需要指定编码才能正确表示字符串 如果你发布一些你想要做的样本代码,那将会有很大的帮助。

最简单的方法是:
   s = u'ma\xf1ana';
   print s.encode('latin-1');

在添加到问题的详细信息后编辑:

在您的情况下,您需要解码您首先阅读的字符串:
   f.decode();
所以尝试改变
u"%s/%s" % (folder, f)
os.path.join(folder, f.decode())

注意,可能需要使用'latin-1'编码更改为使用

命名文件的内容 PS:John Machin提到了改进和清理代码的非常有用的方法。 1

答案 3 :(得分:1)

您是否在Windows cmd.exe框中运行此程序?如果是这样,请尝试在IDLE中运行它,看看是否得到相同的错误。 Cmd.exe框不执行unicode,只执行ascii。

答案 4 :(得分:1)

一些unicode项目:

  • # encoding: utf-8放在文件顶部有时会有帮助(如果您的编辑使用UTF-8保存文件......)
  • s = "i'm a string"
  • u = u"i'm unicode, at least in python < ۳"
  • 如果您使用文件尝试查看codecs模块。

进一步阅读:

答案 5 :(得分:1)

u"%s" % f

在各个地方,你正在做与上述代码类似的事情。这是将str对象转换为unicode对象的错误方法,因为转换是使用sys.getdefaultencoding()(ascii)完成的,这几乎可以保证是错误的。

您应该使用编码/解码方法转换为unicode对象/从unicode对象转换。这需要知道输入的编码(从os.walk返回的字符串)是什么。例如,如果文件名以UTF-8编码

uf = f.decode('utf-8')

将f解释为UTF-8编码的字节序列,并返回正确的unicode对象。类似地,当您需要输出unicode对象时,您会将其转换回str,指定要将其输出的有效编码。

print uf.encode('utf-8')

答案 6 :(得分:0)

我遇到了一些与编码不一致的代码库的不幸。

这是我们用来帮助​​清理它的功能:

def to_unicode(value):
    if isinstance(value, unicode):
        return value
    elif isinstance(value, str):
        try:
            if value.startswith('\xff\xfe'):
                return value.decode('utf-16-le')
            elif value.startswith('\xfe\xff'):
                return value.decode('utf-16-be')
            else:
                return value.decode('utf-8')
        except UnicodeDecodeError:
            return value.decode('latin-1')
    else:
        try:
            return unicode(value)
        except UnicodeError:
            return to_unicode(str(value))
        except TypeError:
            if hasattr(value, '__unicode__'):
                return value.__unicode__()

因此,您可以使用该功能:

print u"Error reading file %s/%s" % (to_unicode(folder), to_unicode(f))

答案 7 :(得分:-1)

而不是:

print "Error reading file %s"%u"%s/%s"%(folder, f)

试试这个:

print "Error reading file %s"%u"%s/%s"%(folder.encode('ascii','ignore'), f.encode('ascii','ignore'))

由于控制台无法打印unicode字符,因此您可以看到正确的名称。 'ignore'告诉编解码器跳过这些字符。你也可以使用'replace'(打印'?'),'xmlcharrefreplace'(替换为代码点的&amp; x ####),'backslashreplace'(替换为代码的\ x ######)

您需要对您打印的每个unicode字符串进行编码。