我试图读取包含以下字符串的文件in_file.ini
:
67000 0 0 "La máquina debe de estar sin pieza"
67002 0 0 "Los Autocalibrados no están retrocedidos"
我想在" "
之间取内容,并在输出文件out_file.ini
上以某种方式将其保存为大写:
alm_siem_alarm0=LA MÁQUINA DEBE DE ESTAR SIN PIEZA
alm_siem_alarm1=LOS AUTOCALIBRADOS NO ESTÁN RETROCEDIDOS
为了实现这一点,这就是我写的代码。
m = 0
f_out = open('C:/out_file.ini', "w")
with open('C:/in_file.ini') as f_in:
lines = list(line for line in (l.strip() for l in f_in) if line)
for i in lines:
f_out.write('alm_siem_alarm' + str(m) + '=' + i.split(' "')[1][:-1].upper() + '\n')
m = m + 1
f_in.close()
f_out.close()
文件in_file.ini
包含一些空行,因此我无法忽略它们,这就是为什么我使用list(line for line in (l.strip() for l in f_in) if line)
您可以看到上面的代码。
我的问题是我不能将非ASCII字符大写为á
。所以out_file.ini
中的输出是:
alm_siem_alarm0=LA MáQUINA DEBE DE ESTAR SIN PIEZA
alm_siem_alarm1=LOS AUTOCALIBRADOS NO ESTáN RETROCEDIDOS
我尝试通过将.decode('utf-8').upper()
添加到i.split(' "')[1][:-1]
字符串来解决此问题,但我收到以下错误:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 4: invalid continuation byte
有人可以帮我一把,告诉我怎样才能将所有单词大写并获得所需的输出?
答案 0 :(得分:2)
我使用codecs
模块修复了该问题,并将encoding='latin-1'
添加到input
和ouput
文件中。
import codecs
m = 0
f_out = codecs.open('C:/out_file.ini', "w", encoding='latin-1')
with codecs.open('C:/in_file.ini', encoding='latin-1') as f_in:
lines = list(line for line in (l.strip() for l in f_in) if line)
for i in lines:
f_out.write('alm_siem_alarm' + str(m) + '=' + i.split(' "')[1].upper() + '\n')
m = m + 1
f_in.close()
f_out.close()
使用LittleQ
的解决方案,代码为:
import codecs
with codecs.open('C:/in_file.ini', encoding='latin-1') as f_in, codecs.open('C:/out_file.ini', "w", encoding='latin-1') as f_out:
print "Leyendo... " + str(f_in.name)
generator = (l.split('"')[1].strip() for l in f_in.readlines() if l.strip())
for i, line in enumerate(generator):
f_out.write('alm_siem_alarm%d=%s\n' % (i, line.upper()))
答案 1 :(得分:-1)
根据您的代码:
with open('C:/in_file.ini', 'r') as f_in, open('C:/out_file.ini', "w") as f_out:
generator = (l.split('"')[1].strip() for l in f_in.readlines() if l.strip())
for i, line in enumerate(generator):
f_out.write('alm_siem_alarm%d=%s\n' % (
i, line.decode('utf-8').upper().encode('utf-8')))
由于file.write(str)
,我们应该将字符串转换为str
类型。