Question

我正在使用此代码在文件夹中递归查找文件，大小超过50000字节。

def listall(parent):
    lis=[] 
    for root, dirs, files in os.walk(parent):
         for name in files:
             if os.path.getsize(os.path.join(root,name))>500000:                                
                   lis.append(os.path.join(root,name))
    return lis

这很好用。但是，当我在Windows中的“临时Internet文件”文件夹中使用它时，我收到此错误。

Traceback (most recent call last):
File "<pyshell#4>", line 1, 
in <module> listall(a) File "<pyshell#2>", 
line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: 
File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg'

我认为这是因为Windows在这个特定的文件夹中给出了带有特殊字符的名字...... 请帮助解决这个问题。

Answer 1

这是因为保存的文件'（某物）+1 [1] .jpg'名称中包含非ASCII字符，不符合“系统默认代码页”的字符（也误称为'ANSI'） “）。

使用基于字节的C标准库（stdio）文件访问函数的Python程序在使用Unicode文件名时存在很大问题。在其他平台上，他们可以使用UTF-8并且每个人都很高兴，但在Windows上，系统默认代码页永远不会是UTF-8，因此总是存在无法在给定编码中表示的字符。它们会被?或有时其他类似字符替换，然后当您尝试读取带有错位名称的文件时，您将收到如上所述的错误。

您获得的代码页取决于您的语言环境：在西方Windows上安装它将是cp1252（类似于ISO-8859-1，'Latin-1'），因此您只能使用{{3} }。

幸运的是，合理的最新版本的Python（2.3+，根据these characters）也可以通过使用本机Win32 API而不是stdio直接支持Unicode文件名。如果将Unicode字符串传递给os.listdir()，Python将使用这些本机Unicode API，您将获得Unicode字符串，其中包括文件名中的原始字符而不是损坏的字符串。因此，如果使用Unicode路径名调用listall：

listall(ur'C:\Documents and Settings\khedarnatha\Local Settings\Temporary Internet Files')

它应该只是工作。

使用Python，如何在Windows中的“临时Internet文件”文件夹中复制文件

1 个答案: