Question

我想用法语字母重命名文件。我正在使用glob来浏览文件和我在互联网上找到的删除法语字母的功能。 supprime_accent似乎没问题。但是，它不会重命名glob函数返回的文件。

有谁知道原因是什么？它与glob编码有关吗？

def supprime_accent(ligne):
    """ supprime les accents du texte source """
    accents = { 'a': ['à', 'ã', 'á', 'â'],
                'e': ['é', 'è', 'ê', 'ë'],
                'i': ['î', 'ï'],
                'u': ['ù', 'ü', 'û'],
                'o': ['ô', 'ö'] }
    for (char, accented_chars) in accents.iteritems():
        for accented_char in accented_chars:
            ligne = ligne.replace(accented_char, char)
    return ligne

for file_name in glob.glob("attachments/*.jpg"):
    print supprime_accent(file_name)

Answer 1

我在这里看到两个潜在的问题。

首先，您需要在源代码中使用unicode字符串，并且需要tell Python what encoding the source code is in。不幸的是，正确的做法会使你表中元音的数量增加一倍......： - \

# -*- coding: UTF-8 -*-
...
accents = { u'a': [u'à', u'ã', u'á', u'â'],
            u'e': [u'é', u'è', u'ê', u'ë'],
            u'i': [u'î', u'ï'],
            u'u': [u'ù', u'ü', u'û'],
            u'o': [u'ô', u'ö'] }

其次，我认为您需要将glob返回的文件名转换为unicode字符串。

import sys
file_name = file_name.decode(sys.getfilesystemencoding())

Python 3.0修复了这两个问题：文件名不必解码，unicode字符串不需要u标记。

Answer 2

尝试这个问题并回答它，问题我已经给出了我正在使用的最终解决方案 latin-1 to ascii

并将一个unicode字符串传递给glob，以获取unicode文件名，例如。

for file_name in glob.glob(u"attachments/*.jpg"):
    print file_name.encode('ascii', 'latin2ascii')

Answer 3

通过使用cp1252 enncoding将file_name转换为unicode，我成功解决了这个问题。

for file_name in glob.glob("attachments/*.jpg"):
    file_name = file_name.decode(sys.getfilesystemencoding())
    print unicodedata.normalize('NFKD', file_name).encode('ascii','ignore')

编辑：Jason通过用file_name.decode（sys.getfilesystemencoding（））替换unicode（file_name，'cp1252'）提供了更好的解决方案

无法删除Python glob返回的字符串中的法语字母

3 个答案: