Question

我有一个Python 2.7程序，可以从各种外部应用程序中写出数据。在写入文件时，我不断被异常所困，直到我将.decode(errors="ignore")添加到正在写出的字符串中。（FWIW，将文件打开为mode="wb"并不能解决此问题。）

有没有办法说“忽略此范围内所有字符串的编码错误”？

Answer 1

您无法在内置类型上重新定义方法，也无法将errors参数的默认值更改为str.decode()。但是，还有其他方法可以实现所需的行为。

稍微好一点：定义自己的decode()功能：

def decode(s, encoding="ascii", errors="ignore"):
    return s.decode(encoding=encoding, errors=errors)

现在，您需要拨打decode(s)而不是s.decode()，但这不是太糟糕，不是吗？

黑客攻击：您无法更改errors参数的默认值，但可以覆盖默认值{{1}的处理程序确实：

errors="strict"

这实际上会将import codecs def strict_handler(exception): return u"", exception.end codecs.register_error("strict", strict_handler)的行为更改为标准errors="strict"行为。请注意，这将是一个全局更改，会影响您导入的所有模块。

我建议不要这两种方式。真正的解决方案是让你的编码正确。（我很清楚这并不总是可行的。）

Answer 2

As mentioned in my thread on the issue来自 Sven Marnach 的黑客攻击甚至可能没有新功能：

Links = struct2table(Links);

fieldstoupdate = {'src', 'dest'};
for ii = 1:numel(fieldstoupdate)
    fname = fieldstoupdate{ii};
    if ismember(fname, Links.Properties.VariableNames)
        Links.(fname) = regexprep(Links.(fname), '^(?!DND|sw_\d+).*', 'RBR');
    end
end

Answer 3

我不确定您的设置是什么，但您可以从str派生一个类并覆盖其解码方法：

class easystr(str):
    def decode(self):
        return str.decode(self, errors="ignore")

如果您随后将所有传入的字符串转换为easystr，则会默默忽略错误：

line = easystr(input.readline())

那就是说，解码一个字符串将其转换为unicode，unicode永远不应该是有损的。你能弄清楚你的字符串使用了哪种编码，并将其作为encoding的{{1}}参数给出？这将是一个更好的解决方案（你仍然可以通过上述方式将其设为默认值）。

您应该尝试的另一件事是以不同方式读取您的数据。这样做，解码错误可能会消失：

decode

我可以解码（errors =“ignore”）Python 2.7程序中所有字符串的默认值吗？

3 个答案: