Question

找到了关于如何检查字符串列表是否在一行内的这个很好的答案 How to check if a line has one of the strings in a list?

但是尝试用dict中的键做类似的事情似乎并没有为我做好工作：

import urllib2

url_info = urllib2.urlopen('http://rss.timegenie.com/forex.xml')
currencies = {"DKK": [], "SEK": []}
print currencies.keys()
testCounter = 0

for line in url_info:
    if any(countryCode in line for countryCode in currencies.keys()):
        testCounter += 1
    if "DKK" in line or "SEK" in line:
        print line
print "testCounter is %i and should be 2 - if not debug the code" % (testCounter)

输出：

['SEK', 'DKK']
<code>DKK</code>
<code>SEK</code>
testCounter is 377 and should be 2 - if not debug the code

想想也许我的问题是因为 .keys（） 给了我一个数组而不是列表..但还没弄明白如何转换它。

Answer 1

改变：

any(countryCode in line for countryCode in currencies.keys())

为：

any([countryCode in line for countryCode in currencies.keys()])

您的原始代码使用生成器表达式，而（我认为）您的意图是列表理解。见：Generator Expressions vs. List Comprehension

<强>更新：我发现使用带有pylab导入的ipython解释器，我得到了与你相同的结果（377计数与预期的2计数）。我意识到问题是'any'来自numpy包，它意味着在数组上工作。接下来，我加载了一个没有pylab的ipython解释器，因此'any'来自 builtin 。在这种情况下，您的原始代码有效。因此，如果您使用ipython解释器类型：

help(any)

并确保它来自内置模块。如果是这样，您的原始代码应该正常工作

Answer 2

这不是检查xml文件的好方法。

这很慢。您正在进行潜在的N * M子字符串搜索，其中N是行数，M是键的数量。
XML不是面向行的文本格式。您的子字符串搜索也可以找到属性名称或元素名称，这可能不是您想要的。如果XML文件碰巧将其所有元素放在一行而没有空格（对于机器生成和处理的XML来说很常见），那么匹配的次数会少于预期。

如果你有面向行的文本输入，我建议你从你的键列表构建一个正则表达式：

import re
linetester = re.compile('|'.join(re.escape(key) for key in currencies))

for match in linetester.finditer(entire_text):
    print match.group(0)

#or if entire_text is too long and you want to consume iteratively:

for line in entire_text:
        for match in linetester.find(line):
            print match.group(0)

但是，由于您有XML，因此您应该使用实际的XML处理器：

import xml.etree.cElementTree as ET

for elem in forex.findall('data/code'):
    if elem.text in currencies:
        print elem.text

如果您只对存在的代码感兴趣并且不关心特定条目，则可以使用set intersection：

codes = frozenset(e.text for e in forex.findall('data/code'))

print codes & frozenset(currencies)

Python：如果dict键符合要求

2 个答案: