Question

我需要将来自utf-8文件的unicode字符串与Python脚本中定义的常量进行比较。

我在Linux上使用Python 2.7.6。

如果我在Spyder（Python编辑器）中运行上面的脚本，我就可以了，但是如果我从终端调用Python脚本，我的测试失败了。在调用脚本之前，是否需要在终端中导入/定义某些内容？

脚本（＆＃34; pythonscript.py＆＃34;）：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv

some_french_deps = []
idata_raw = csv.DictReader(open("utf8_encoded_data.csv", 'rb'), delimiter=";")
for rec in idata_raw:
    depname = unicode(rec['DEP'],'utf-8')
    some_french_deps.append(depname)

test1 = "Tarn"
test2 = "Rhône-Alpes"
if test1==some_french_deps[0]:
  print "Tarn test passed"
else:
  print "Tarn test failed"
if test2==some_french_deps[2]:
  print "Rhône-Alpes test passed"
else:
  print "Rhône-Alpes test failed"

utf8_encoded_data.csv：

DEP
Tarn
Lozère
Rhône-Alpes
Aude

从Spyder编辑器运行输出：

Tarn test passed
Rhône-Alpes test passed

从终端运行输出：

$ ./pythonscript.py 
Tarn test passed
./pythonscript.py:20: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if test2==some_french_deps[2]:
Rhône-Alpes test failed

Answer 1

您正在将字节字符串（类型str）与unicode值进行比较。 Spyder 将的默认编码从ASCII更改为UTF-8，并且Python在比较两种类型时执行字节字符串和unicode值之间的隐式转换。您的字节字符串被编码为UTF-8，因此在Spyder下比较成功。

解决方法是不使用字节字符串，而是使用unicode文字代替两个测试值：

test1 = u"Tarn"
test2 = u"Rhône-Alpes"

在我看来，更改系统默认编码是个糟糕的主意。您的代码应该正确使用Unicode而不是依赖于隐式转换，但是更改隐式转换的规则只会增加混淆，而不会使任务变得更容易。

Answer 2

只使用depname = rec['DEP']应该可以正常工作，因为您已经声明了编码。

如果您print some_french_deps[2]它会打印Rhône-Alpes，那么您的比较就会有效。

Answer 3

当您将字符串对象与unicode对象进行比较时，python会抛出此警告。

要解决此问题，您可以写

test1 = "Tarn"
test2 = "Rhône-Alpes"

作为

test1 = u"Tarn"
test2 = u"Rhône-Alpes"

其中'u'表示它是一个unicode对象。

Python unicode相等比较在终端失败但在Spyder编辑器下工作

3 个答案: