Question

我下载的文件可能包含中文字符，我想将其转换为utf8。

在某些情况下，有些字符无法转换，我会替换无法通过问号转换的字符＆＃39;�＆＃39;。

我创建了一个接收2个参数的程序，一个是文件路径，另一个是源文件编码

在某些情况下，此代码以无限循环结束，我不知道为什么。造成这种无限循环的原因是什么？

源代码：

class ErrorHandler:
    def __init__(self,file_path):
        self.file_path = file_path
        self.previous_end_position = -65535
        self.error_threshold = 0
    def error_handler(self,exception):
        if exception.start == self.previous_end_position+1:
            self.error_threshold+=1
        if self.error_threshold >= 64:
            raise exception
        else:
            print("Start:"+str(exception.start))
            print("End:"+str(exception.end))
            self.previous_end_position = exception.end
        return ("�",-1,)

src_path = sys.argv[1]
try:
    src_ext = src_path[src_path.rindex("."):]
    dest_path = src_path[:src_path.rindex(".")]+"_utf8"+src_ext
except:
    dest_path = src_path+"_utf8"

src_encoding = sys.argv[2]

codecs.register_error("myreplace",ErrorHandler(src_path).error_handler)

with io.TextIOWrapper(open(src_path,"rb"),encoding=src_encoding,errors="myreplace") as src , open(dest_path,"w") as dest:
    for line in src:
        dest.write(line)

Python Codecs模块寄存器错误无限循环

0 个答案: