Question

我必须逐行读取纯文本（UTF-8）文件并将其转换为.tex文件（只是带标记的另一个纯文本文件），以供LaTeX处理器处理。

我想要做的一件事就是将像é这样的特殊字符转换为LaTeX表示：\'e

所以我写道：

with open(input, "r") as in_file, open(output, "w") as out_file:
        for line in in_file:
                # Other code here
                line.replace('é', "\\'e") # This fails as below
                # Other code here
                out_file.write(line)

在输入文件上运行脚本会给出：

    line.replace('é', "\\'e")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

很明显，解释器正在使用ascii编解码器。为什么呢？

而不是正常的open(...)我也尝试了codecs.open(input, "r", "utf-8")，对输出文件也是如此，但是得到了相同的错误。

在运行line.replace(...)之前，我还尝试依次使用以下每一行（不是同时使用，第一行，然后是另一行）将line转换为unicode字符串：

line = unicode(line, "utf-8")
line = line.decode("utf-8")

但得到完全相同的错误。

这样做的正确方法是什么？

更新1：在询问此问题之前，我已将# -*- coding: UTF-8 -*-作为第二行添加到.py文件中。没有它，解释器在尝试运行脚本时会出现以下错误：

SyntaxError: Non-ASCII character '\xc3' in file <filename> on line 46, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Answer 1

可能是源问题。尝试将其放在文件的顶部：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

有关详细信息，请查看此处：https://www.python.org/dev/peps/pep-0263/

使用Python将重音/特殊字符从纯文本文件转换为LaTeX表示

1 个答案: