我有这个HTML源代码: - http://pastebin.com/itMYaimq
。我正在运行以下BeautifulSoup命令来解析HTML
def check_img(self, feed):
return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
此处feed
是HTML源代码。
执行时抛出。
[2015-01-08 10:19:16,415: WARNING/Worker-2] Traceback (most recent call last):
[2015-01-08 10:19:16,415: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/rule_processor.py", line 58, in do_akamai_analysis
[2015-01-08 10:19:16,416: WARNING/Worker-2] resp, self.analysis.url, self.analysis.id)
[2015-01-08 10:19:16,416: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/rules.py", line 794, in akamai_rule_analysis
[2015-01-08 10:19:16,416: WARNING/Worker-2] result[RULES.FEO_CHECKS] = check_feo_optimizations(analysis_id, url)
[2015-01-08 10:19:16,417: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/rules.py", line 1320, in check_feo_optimizations
[2015-01-08 10:19:16,417: WARNING/Worker-2] return FEO_processor.FEOProcessor().process_feo_debug_output(analysis_id, url)
[2015-01-08 10:19:16,417: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 38, in process_feo_debug_output
[2015-01-08 10:19:16,417: WARNING/Worker-2] self.result[name] = (False, True)[getattr(self,func)(feed)]
[2015-01-08 10:19:16,418: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 64, in check_img
[2015-01-08 10:19:16,418: WARNING/Worker-2] return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
[2015-01-08 10:19:16,418: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1180, in find_all
[2015-01-08 10:19:16,419: WARNING/Worker-2] return self._find_all(name, attrs, text, limit, generator, **kwargs)
[2015-01-08 10:19:16,419: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 505, in _find_all
[2015-01-08 10:19:16,419: WARNING/Worker-2] found = strainer.search(i)
[2015-01-08 10:19:16,420: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1540, in search
[2015-01-08 10:19:16,420: WARNING/Worker-2] found = self.search_tag(markup)
[2015-01-08 10:19:16,420: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1512, in search_tag
[2015-01-08 10:19:16,421: WARNING/Worker-2] if not self._matches(attr_value, match_against):
[2015-01-08 10:19:16,421: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1578, in _matches
[2015-01-08 10:19:16,421: WARNING/Worker-2] return match_against(markup)
[2015-01-08 10:19:16,421: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 64, in <lambda>
[2015-01-08 10:19:16,422: WARNING/Worker-2] return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
[2015-01-08 10:19:16,422: WARNING/Worker-2] TypeError: argument of type 'NoneType' is not itterable
我打印了feed
以查看它的价值。它打印了HTML源代码,因此它不是None
。那么为什么我会将此错误视为argument of type 'NoneType' is not iterable
答案 0 :(得分:2)
您的src
lambda正在针对None
进行测试:
>>> x = None
>>> 'data' not in x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument of type 'NoneType' is not iterable
当您尝试验证没有<img>
属性的src
标记时会发生这种情况;你的输入源有8个这样的标签:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(requests.get('http://pastebin.com/raw.php?i=itMYaimq').content)
>>> len(soup.find_all('img', src=False))
8
只需测试一下:
lambda x: x and 'data' not in x
您的测试可以简化;没有必要找到所有匹配,只需要第一个匹配:
blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return 1 if blzsrc_image else 0
如果布尔值可以(而不是1
或0
),您可以使用:
blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return blzsrc_image is not None