在将内容保存到其他文件之前读取文件并大写非ascii字符

时间:2015-06-23 07:57:11

标签: python

我试图读取包含以下字符串的文件in_file.ini

67000 0 0 "La máquina debe de estar sin pieza"
67002 0 0 "Los Autocalibrados no están retrocedidos"

我想在" "之间取内容,并在输出文件out_file.ini上以某种方式将其保存为大写:

alm_siem_alarm0=LA MÁQUINA DEBE DE ESTAR SIN PIEZA
alm_siem_alarm1=LOS AUTOCALIBRADOS NO ESTÁN RETROCEDIDOS

为了实现这一点,这就是我写的代码。

m = 0
f_out = open('C:/out_file.ini', "w")
with open('C:/in_file.ini') as f_in:
    lines = list(line for line in (l.strip() for l in f_in) if line)
    for i in lines:
        f_out.write('alm_siem_alarm' + str(m) + '=' + i.split(' "')[1][:-1].upper() + '\n')
        m = m + 1

f_in.close()
f_out.close()

文件in_file.ini包含一些空行,因此我无法忽略它们,这就是为什么我使用list(line for line in (l.strip() for l in f_in) if line)您可以看到上面的代码。

我的问题是我不能将非ASCII字符大写为á。所以out_file.ini中的输出是:

alm_siem_alarm0=LA MáQUINA DEBE DE ESTAR SIN PIEZA
alm_siem_alarm1=LOS AUTOCALIBRADOS NO ESTáN RETROCEDIDOS

我尝试通过将.decode('utf-8').upper()添加到i.split(' "')[1][:-1]字符串来解决此问题,但我收到以下错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 4: invalid continuation byte

有人可以帮我一把,告诉我怎样才能将所有单词大写并获得所需的输出?

2 个答案:

答案 0 :(得分:2)

我使用codecs模块修复了该问题,并将encoding='latin-1'添加到inputouput文件中。

import codecs
m = 0
f_out = codecs.open('C:/out_file.ini', "w", encoding='latin-1')
with codecs.open('C:/in_file.ini', encoding='latin-1') as f_in:
    lines = list(line for line in (l.strip() for l in f_in) if line)
    for i in lines:
        f_out.write('alm_siem_alarm' + str(m) + '=' + i.split(' "')[1].upper() + '\n')
        m = m + 1

f_in.close()
f_out.close()

使用LittleQ的解决方案,代码为:

import codecs

with codecs.open('C:/in_file.ini', encoding='latin-1') as f_in, codecs.open('C:/out_file.ini', "w", encoding='latin-1') as f_out:
    print "Leyendo... " + str(f_in.name)
    generator = (l.split('"')[1].strip() for l in f_in.readlines() if l.strip())
    for i, line in enumerate(generator):
        f_out.write('alm_siem_alarm%d=%s\n' % (i, line.upper()))

答案 1 :(得分:-1)

根据您的代码:

with open('C:/in_file.ini', 'r') as f_in, open('C:/out_file.ini', "w") as f_out:
    generator = (l.split('"')[1].strip() for l in f_in.readlines() if l.strip())
    for i, line in enumerate(generator):
        f_out.write('alm_siem_alarm%d=%s\n' % (
            i, line.decode('utf-8').upper().encode('utf-8')))

由于file.write(str),我们应该将字符串转换为str类型。