Question

我正在尝试将python中的字符串拆分为字符列表。我知道在python中有很多方法可以做到这一点，但我有一个案例，那些方法不能给我想要的结果。

当我在字符串中明确写入'\ t'之类的特殊字符时，会出现问题（我并不是指真正的标签）。

示例：

string = "    Hello \t World."

我需要的输出是：

list_of_chars = [' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

但是当我使用this question中给出的方法时，我得到一个包含'/ t'作为整个字符串的列表 - 没有分开。

示例：

> list(string)
> ['H', 'e', 'l', 'l', 'o', 'w', ' ', '\t', ' ', 'W', 'o', 'r', 'l', 'd', '.']

我想知道为什么会这样，以及如何得到我想要的东西。

Answer 1

您可以相应地替换字符串：

import itertools
txt = "    Hello \t World."

specials = { 
    '\a' : '\\a', #     ASCII Bell (BEL)
    '\b' : '\\b', #     ASCII Backspace (BS)
    '\f' : '\\f', #     ASCII Formfeed (FF)
    '\n' : '\\n', #     ASCII Linefeed (LF)
    '\r' : '\\r', #     ASCII Carriage Return (CR)
    '\t' : '\\t', #     ASCII Horizontal Tab (TAB)
    '\v' : '\\v'  #     ASCII Vertical Tab (VT)
}

# edited out: # txt2 = "".join([x if x not in specials else specials[x] for x in txt])
txt2 = itertools.chain(* [(list(specials[x]) if x in specials else [x]) for x in txt])

print(list(txt2))

输出：

[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 
 'o', 'r', 'l', 'd', '.']

列表理解看起来更“积极”，并使用list(itertools.chain(*[...]))代替list("".join([...]))，这应该更高效。

Answer 2

您应该查看String Literal文档，其中包含：

反斜杠（\）字符用于转义具有特殊含义的字符，例如换行符，反斜杠本身或引号字符。字符串文字可以选择以字母r' or R'为前缀;这些字符串称为原始字符串，并对反斜杠转义序列使用不同的规则。

在示例字符串中，\t不是两个字符，而是一个代表 ASCII水平制表符（TAB）的字符。

为了告诉你的Python解释器这两个是单独的字符，你应该使用原始字符串（在字符串“”之前使用 r ）：

>>> list(r"    Hello \t World.")
[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

但是在这里你也会在结果列表中看到两个\\，这只是Python表示\的方式。

对于Python解释器'\'是无效字符串，因为字符串中的\'表示单引号（'）。因此，当你执行'\'时，它会引发错误，因为对于Python，字符串中没有结束引号：

>>> '\'
  File "<stdin>", line 1
    '\'
      ^
SyntaxError: EOL while scanning string literal

如果你不能将你的字符串声明为原始字符串（因为它已经定义或从其他来源导入），你可以通过将编码设置为“unicode-escape”将其转换为字节字符串：

>>> my_str = "    Hello \t World."

>>> unicode_escaped_string = my_str.encode('unicode-escape')
>>> unicode_escaped_string
b'    Hello \\t World.'

由于它是一个字节字符串，因此需要调用chr来获取每个字节的相应字符值。例如：

>>> list(map(chr, unicode_escaped_string))
[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

Answer 3

您可以转换为Python的文字字符串，然后按字符分割？

string = "    Hello \t World."
string_raw = string.encode('unicode-escape')
print([ch for ch in string_raw])
print([chr(ch) for ch in string_raw])

输出：

[32, 32, 32, 32, 72, 101, 108, 108, 111, 32, 92, 116, 32, 87, 111, 114, 108, 100, 46]
[' ', ' ', ' ', ' ', 'H', 'e', 'l', 'l', 'o', ' ', '\\', 't', ' ', 'W', 'o', 'r', 'l', 'd', '.']

Ascii 92是一个强烈的反弹，即使你在终端上打印它，它也会显示它被转义。

Answer 4

\t表示标签，如果您想要明确地使用\字符，则需要在字符串中将其转义：

string = "    Hello \\t World."

或使用原始字符串：

string = r"    Hello \t World."

如何将字符串中的“\ t”拆分为两个单独的字符“\”和“t”？（如何拆分转义序列？）

4 个答案:

如何将字符串中的“\ t”拆分为两个单独的字符“\”和“t”？ （如何拆分转义序列？）

4 个答案:

如何将字符串中的“\ t”拆分为两个单独的字符“\”和“t”？（如何拆分转义序列？）