Question

如果\x出现在该字词中，我想删除字符串中的字词。我正在尝试这个

text = "Joe is \xd8\xae\xd8\xa7\ a boy."
sep = "\x"
rest = text.split(sep, 1)[0]
print(rest)

但它给出了;

SyntaxError :( unicode error）＆＃39; unicodeescape＆＃39;编解码器无法解码位置0-1中的字节：截断的\ xXX转义

Answer 1

如果您只想打印有效的ASCII字符

text = "Joe is \xd8\xae\xd8\xa7\ a boy."
filtered = ''

for x in text:
    try:
        x.encode('ascii')
        filtered += x
    except:
        continue
print(filtered)

输出

乔是一个男孩。

如果您只想过滤有效的字母字符，可以使用.isalpha（）。如果您也想允许数字字符，可以使用.isalnum（）而不是.isalpha（）。 \ x 是一个转义字符，因此您的方法无效。

text = "Joe is \xd8\xae\xd8\xa7\ a boy."
filtered = ''

for x in text:
    if x.isalpha() or x.isspace():
        filtered += x
print(filtered)

输出

乔是个男孩

<强>更新

编辑 如果上述两种方法不能单独使用，并且您只想 Joe是男孩作为输出

text = "Joe is \xd8\xae\xd8\xa7\ a boy."
filtered = ''

for x in text:
    try:
        x.encode('ascii')
        filtered += x
    except:
        continue
print(filtered)
new_filtered = ''

for x in filtered:
    if x.isalpha() or x.isspace():
        new_filtered += x
print(new_filtered)

输出

乔是个男孩

您也可以使用正则表达式

Answer 2

＆＃34; \ X＆＃34;不应该考虑你应该制作一个测试字符串的原始字符串来制作＃34; \ x＆＃34;考虑为角色：

text = r"Joe is \xd8\xae\xd8\xa7\ a boy."
sep = r"\x"
rest = text.split(sep, 1)[0]
print(rest)

编辑

拥有Joe is a boy：

text = r"Joe is \xd8\xae\xd8\xa7\ a boy."
sep = r"\x"
text=text.split(" ")
rest=[i for i in text if sep not in i]
output=" ".join(rest)
print(output)

Answer 3

更新：您澄清说您真的想从字符串中删除不可打印的字符。

import re

text = "Joe is \xd8\xae\xd8\xa7\ a boy."
#                              ^
#            stray backslash --|
sep = r'[^\x20-\x7e]' # Any non-printable character
rest = re.sub(sep, '', text)
# rest = 'Joe is \\ a boy.'
print(rest)
# Joe is \ a boy.

反斜杠就在那里，因为它在你的输入中。

Python正试图将下面字符串中的\x解释为使用十六进制值插入字符的命令。

    sep = "\x"
         ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

要修复它，要么像这样转义反斜杠：

sep = "\\x"

或者更好的是，使用这样的原始字符串：

sep = r"\x"

告诉Python不要扩展字符串中的\x。

如果您尝试拆分不可打印的字符......

import re

text = "Joe is \xd8\xae\xd8\xa7\ a boy."
# Not(^) a printable ascii character (0x20 - 0x7e)
sep = r'[^\x20-\x7e]'
first_part, rest = re.split(sep, text, maxsplit=1)
print(first_part)

如果您真的在寻找文字'\x' ...

# Raw (r'') strings will not evaluate your backslash in this
# string.
sep = r'\x'
first_part, rest = text.split(sep, maxsplit=1)
print(first_part)

Answer 4

将r与字符串一起使用，将其视为原始字符串。因此转义码将被忽略。

text = r"Joe is \xd8\xae\xd8\xa7\ a boy."
sep = r"\x"
list=text.split(" ")
t=" "
for i in list:
        temp=i.split(sep)
        if len(temp) < 2:
                t+=' '+i
print t

<强>输出

Joe is a boy.

删除python中特定单词后面的rest字符串

4 个答案: