Question

我不得不将我的python脚本从python 3重写为python2，之后我遇到了使用ElementTree解析特殊字符的问题。

这是我的一部分xml：

<account number="89890000" type="Kostnad" taxCode="597" vatCode="">Avsättning egenavgifter</account>

这是解析此行时的输出：

('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avs\xc3\xa4ttning egenavgifter')

因此，“ä”这个角色似乎有问题。

这就是我在代码中的表现：

sys.setdefaultencoding( "UTF-8" )
xmltree = ET()

xmltree.parse("xxxx.xml")

printAccountPlan(xmltree)

def printAccountPlan(xmltree):
    print("account:",str(i.attrib['number']),      "AccountType:",str(i.attrib['type']),"Name:",str(i.text))

任何人都有一个ide来让ElementTree解析字符“ä”，所以结果将是这样的：

('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')

Answer 1

你在Python 2和Python 3之间同时遇到两个不同的区别，这就是你得到意想不到的结果的原因。

第一个区别是您可能已经意识到的：第2版中Python的print语句在第3版中成为打印函数。这种变化在您的情况下创建了一个特殊情况，我稍后会介绍。但简而言之，这就是“印刷”的工作方式的不同之处：

在Python 3中：

>>> # Two arguments 'Hi' and 'there' get passed to the function 'print'.
>>> # They are concatenated with a space separator and printed.
>>> print('Hi', 'there') 
>>> Hi there

在Python 2中：

>>> # 'print' is a statement which doesn't need parenthesis.
>>> # The parenthesis instead create a tuple containing two elements 
>>> # 'Hi' and 'there'. This tuple is then printed.
>>> print('Hi', 'there')
>>> ('Hi', 'there')

在你的情况下，第二个问题是元组通过在每个元素上调用repr（）来打印自己。在Python 3中，repr（）可以根据需要显示unicode。但是在Python 2中，repr（）对任何超出可打印ASCII范围（例如，大于127）的字节值使用转义字符。这就是你看到它们的原因。

您可以决定是否解决此问题，具体取决于您的代码目标是什么。 Python 2中元组的表示使用转义字符，因为它不是为了向最终用户显示而设计的。它更适合作为开发人员的内部便利，用于故障排除和类似任务。如果您只是为自己打印它，那么您可能不需要更改一个东西，因为Python向您显示该字符串中的非ASCII字符的编码字节正确。如果你想向最终用户显示具有元组外观格式的东西，那么一种方法（保留正确的unicode打印）是手动创建格式，如下所示：

def printAccountPlan(xmltree):
    data = (i.attrib['number'], i.attrib['type'], i.text)
    print "('account:', '%s', 'AccountType:', '%s', 'Name:', '%s')" % data
# Produces this:
# ('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')

ElementTree不会使用Python 2.7解析特殊字符

1 个答案: