Question

我正在解析一个XML文件，其中包含一些带有 Python 3.6 的UTF-8编码文本：

<line>
  <text>Some text which could end with ¬</text>
</line>

我使用xml.etree.ElementTree解析它，并将text元素作为元素：

<Element 'text' at 0x105577c78>

我可以用

获取文本字符串

text_string = text.text.encode('utf-8')
msg = "Text string: {}".format(text_string)
self.stdout.write(self.style.SUCCESS(msg))

我得到了：

Text string: b'Some text which could end with \xac'

现在我需要知道这个字符串是否以¬字符结尾：

if text_string.endswith('¬'):
    print("The text ends which the char!")

但我明白了：

TypeError: endswith first arg must be bytes or a tuple of bytes, not str

如果我更改为if text_string.endswith(b'¬'):，我会收到另一个错误：

    if text_string.endswith(b'\xac'):
                           ^
SyntaxError: bytes can only contain ASCII literal characters.

我理解我很困惑，因为text_string是字节而不是字符串，但我无法理解如何解决我的问题。

如何将字节转换为字符串？或者我如何在字节对象中搜索特殊字符？

谢谢！

Answer 1

<强>谢谢！

两者都在评论工作中提出建议：

if text_string.endswith(b'\xac'):
if text_string.endswith('¬'.encode('utf-8')):

Answer 2

对于Python 2.7，如果没有给出其他编码提示，则默认为ASCII编码。请参阅此PEP documentation。

因此，如果您使用的是Python 2.7，则在程序脚本的顶部放置以下注释，那么一切都应该有效。

# -*- coding: utf-8 -*-

对于Python 3.x，默认值为UTF-8编码，因此您需要更改以下内容：

自：

text_string = text.text.encode('utf-8')

要：

text_string = text.text

希望这有帮助。