Question

我如何转换此字符串

'\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '

到

'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '

请记住，我也想为＆＃39; \ t＆＃39;和所有其他转义字符。反向的代码是

def fix_string(s):
    """ takes the string and replaces any `\n` with `\\n` so that the read file will be recognized """
    # escape chars = \t , \b , \n , \r , \f , \' , \" , \\
    new_s = ''
    for i in s:
            if i == '\t':
                    new_s += '\\t'
            elif i == '\b':
                    new_s += '\\b'
            elif i == '\n':
                    new_s += '\\n'
            elif i == '\r':
                    new_s += '\\r'
            elif i == '\f':
                    new_s += '\\f'
            elif i == '\'':
                    new_s += "\\'"
            elif i == '\"':
                    new_s += '\\"'
            else:
                    new_s += i
    return new_s

我是否可能需要查看字符的实际数值并检查下一个字符，如果我找到一个（＆＃39; \＆＃39;，92）字符后跟一个（＆＃39; n＆＃39; 110）？

Answer 1

不要在这里重新发明轮子。 Python有你的背。此外，正确处理转义语法，比看起来更难。

处理此

的正确方法

在Python 2中，使用str-to-str string_escape codec：

string.decode('string_escape')

这将解释任何Python识别的字符串转义序列，包括\n和\t。

演示：

>>> string = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '
>>> string.decode('string_escape')
'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '
>>> print string.decode('string_escape')

    this is a docstring for
    the main function.
    a,
    b,
    c

>>> '\\t\\n\\r\\xa0\\040'.decode('string_escape')
'\t\n\r\xa0 '

在Python 3中，您必须使用codecs.decode()和unicode_escape编解码器：

codecs.decode(string, 'unicode_escape')

因为没有str.decode()方法，这不是str - ＆gt;字节转换。

演示：

>>> import codecs
>>> string = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '
>>> codecs.decode(string, 'unicode_escape')
'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '
>>> print(codecs.decode(string, 'unicode_escape'))

    this is a docstring for
    the main function.
    a,
    b,
    c

>>> codecs.decode('\\t\\n\\r\\xa0\\040', 'unicode_escape')
'\t\n\r\xa0 '

为什么直截了当地`str.replace()`不会削减它

你可以尝试用str.replace()自己做，但是你还需要实现正确的转义解析;以\\\\n为例;这是\\n，已转义。如果您按顺序天真地应用str.replace()，则最终会使用\n或\\\n：

>>> '\\\\n'.decode('string_escape')
'\\n'
>>> '\\\\n'.replace('\\n', '\n').replace('\\\\', '\\')
'\\\n'
>>> '\\\\n'.replace('\\\\', '\\').replace('\\n', '\n')
'\n'

\\对应仅由一个\字符替换，而n未被解释。但是，替换选项最终会将\与n一起替换为换行符，或，最后将\\替换为{{} 1}}，然后将\和\替换为换行符。无论哪种方式，你最终输出错误。

手动处理此问题的缓慢方法

您必须逐个处理字符，并根据需要提取更多字符：

现在可以处理_map = { '\\\\': '\\', "\\'": "'", '\\"': '"', '\\a': '\a', '\\b': '\b', '\\f': '\f', '\\n': '\n', '\\r': '\r', '\\t': '\t', } def unescape_string(s): output = [] i = 0 while i < len(s): c = s[i] i += 1 if c != '\\': output.append(c) continue c += s[i] i += 1 if c in _map: output.append(_map[c]) continue if c == '\\x' and i < len(s) - 2: # hex escape point = int(s[i] + s[i + 1], 16) i += 2 output.append(chr(point)) continue if c == '\\0': # octal escape while len(c) < 4 and i < len(s) and s[i].isdigit(): c += s[i] i += 1 point = int(c[1:], 8) output.append(chr(point)) return ''.join(output)和标准的1个字母的转义符，但不能处理\xhh个八进制转义序列，或\0..个Unicode代码点，或\uhhhh unicode名称引用，它也不像Python那样处理格式错误的转义。

但它正确处理转义转义：

\N{name}

知道这比使用内置编解码器慢得多。

Answer 2

最简单的解决方案就是使用str.replace（）调用

s = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '
s1 = s.replace('\\n','\n')
s1

输出

'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '

Answer 3

def convert_text(text):
    return text.replace("\\n","\n").replace("\\t","\t")


text = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '
print convert_text(text)

输出：

    this is a docstring for
    the main function.
    a,
    b,
    c

如何转换python字符串

3 个答案:

处理此

为什么直截了当地`str.replace()`不会削减它

手动处理此问题的缓慢方法

如何转换python字符串

3 个答案:

处理此

为什么直截了当地str.replace()不会削减它

手动处理此问题的缓慢方法

为什么直截了当地`str.replace()`不会削减它