Question

我有一个文章标题列表，我将其存储在文本文件中并加载到列表中。我正在尝试将当前标题与该列表中的所有标题进行比较，如此

def duplicate(entry):
    for line in posted_titles:
        print 'Comparing'
        print entry.title
        print line
        if line.lower() == entry.title.lower()
            print 'found duplicate'
            return True
    return False

我的问题是，这永远不会回归真实。当它打印出entry.title和line的相同字符串时，它不会将它们标记为相等。是否有字符串比较方法或我应该使用的东西？

编辑在查看字符串的表示后，repr(line)正在比较的字符串如下所示：

u"Some Article Title About Things And Stuff - Publisher Name"
'Some Article Title About Things And Stuff - Publisher Name'

Answer 1

如果你提供一个实际的例子，那将会更有帮助。

无论如何，你的问题是Python 2中的不同字符串编码。entry.title显然是一个unicode字符串（在引号之前用u表示），而line是一个正常str（反之亦然）。

对于在两种格式（ASCII字符，可能还有更多）中均等表示的所有字符，相等比较将成功。对于其他角色，它不会：

>>> 'Ä' == u'Ä'
False

当按相反顺序进行比较时，IDLE实际上会在此处发出警告：

>>> u'Ä' == 'Ä'
Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

您可以使用str.decode从正常字符串中获取unicode字符串并提供原始编码。例如我的IDLE中的latin1：

>>> 'Ä'.decode('latin1')
u'\xc4'
>>> 'Ä'.decode('latin1') == u'Ä'
True

如果您知道它是utf-8，您也可以指定它。例如，使用utf-8保存的以下文件也将打印True：

# -*- coding: utf-8 -*-
print('Ä'.decode('utf-8') == u'Ä')

Answer 2

==可用于字符串比较。确保你正在处理字符串

if str(line).lower() == str(entry.title).lower()

其他可能的语法是布尔表达式str1 is str2。

比较不起作用的字符串

2 个答案: