Question

我有一个包含以下路径的文件：D：/ bar /クレイジー·ヒッツ！/foo.abc

我正在从XML文件解析路径，并以path的形式将其存储在名为file://localhost/D:/bar/クレイジー・ヒッツ！/foo.abc的变量中然后，正在进行以下操作：

path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.url2pathname(path)
path=urllib.unquote(path)

错误是：

IOError: [Errno 2] No such file or directory: 'D:\\bar\\\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81\\foo.abc'

更新1：我在Windows 7上使用Python 2.7

Answer 1

错误中的路径是：

'\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'

我认为这是您的文件名的UTF8编码版本。

我在Windows7上创建了一个同名文件夹，并在其中放置了一个名为“abc.txt”的文件：

>>> a = '\xe3\x82\xaf\xe3\x83\xac\xe3\x82\xa4\xe3\x82\xb8\xe3\x83\xbc\xe3\x83\xbb\xe3\x83\x92\xe3\x83\x83\xe3\x83\x84\xef\xbc\x81'
>>> os.listdir('.')
['?????\xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01']
>>> 
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'\u30af\u30ec\u30a4\u30b8\u30fc\u30fb\u30d2\u30c3\u30c4\uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']

因此，似乎Duncan对path.decode('utf8')的建议起到了作用。

更新

我无法为您测试，但我建议您在执行.decode('utf8')之前尝试检查路径是否包含非ascii。这有点儿hacky ......

ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/  part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
  path = path.decode('utf8')
path=urllib.url2pathname(path)

Answer 2

将文件名作为unicode字符串提供给open来电。

如何生成文件名？

如果由您提供为常数

在脚本开头附近添加一行：

# -*- coding: utf8 -*-

然后，在支持UTF-8的编辑器中，将path设置为unicode文件名：

path = u"D:/bar/クレイジー・ヒッツ！/foo.abc"

从目录内容列表中读取

使用unicode dirspec：

检索目录的内容

dir_files= os.listdir(u'.')

从文本文件中读取

使用codecs.open打开包含文件名的文件，从中读取unicode数据。您需要指定文件的编码（因为您知道计算机上非Unicode应用程序的“默认Windows字符集”）。

无论如何

做一个：

path= path.decode("utf8")

打开文件前

;如果不是“utf8”，则替换正确的编码。

Answer 3

以下是documentation中的一些有趣内容：

sys.getfilesystemencoding（）

返回使用的编码名称   将Unicode文件名转换为   系统文件名，如果是，则为None   使用系统默认编码。该   结果值取决于操作   system：在Mac OS X上，编码为   'UTF-8'。在Unix上，编码是   用户的偏好根据   nl_langinfo（CODESET）的结果，或   如果nl_langinfo（CODESET）没有   失败。在Windows NT +上，文件名是   本机的Unicode，所以没有转换   执行。 getfilesystemencoding（）   仍然返回'mbcs'，因为这是   应用程序应该使用的编码   当他们明确想转换时   Unicode字符串到字节串   用作文件时是等效的   名。在Windows 9x上，编码是   'MBCS'。

2.3版中的新功能。

如果我理解正确，你应该将文件名作为unicode传递：

f = open(unicode(path, encoding))

Python无法在路径中打开包含非英文字符的文件

3 个答案:

如果由您提供为常数

从目录内容列表中读取

从文本文件中读取

无论如何