Question

对于第一个print tag，我收到了数百个<a标签的大量列表。对于第二个print tag，我会收到一个包含四个<a标记的列表，不包括我想要的标记。

我想要的标签之一是tags的末尾。打印完所有数百个标签后，我正在打印最后一个标签，即打印正确的结束标签。但是然后通过在同一个（未更改的）列表tags上运行另一个for循环，我不仅得到不同的结果，而且显着不同。

无论有没有'打印'\ n \ n \ n'现象正在发生，只是让两张照片之间的分割更容易让我看到。

此列表在第一个和第二个for循环之间发生了什么导致此问题？

（这段代码就像我在我的脚本中一样。原来我没有第一个for循环中的行直到空行，并且这样做是为了调试缺少正确的URL从最终结果。）

编辑：此外，以下是所有print语句的打印内容（仅print循环中第一个for的最后一部分）：

import urllib
from bs4 import BeautifulSoup

startingList = ['http://www.stowefamilylaw.co.uk/']
for url in startingList:
    try:
        html = urllib.urlopen(url)
        soup = BeautifulSoup(html,'lxml')
        tags = soup('a')
        for tag in tags:
            print tag
        print tags[-1]
        print '\n\n\n'

        for tag in tags:
            print tag
            if not tag.get('href', None).startswith('..'):
                continue
    except:
        continue

....

<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/decrees-orders-forms/" itemprop="url">Decrees, Orders &amp; Forms</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/faq-category/international-divorce/" itemprop="url">International Divorce</a>
<a class="shiftnav-target"><i class="fa fa-chevron-left"></i> Back</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>
<a class="shiftnav-target" href="http://www.stowefamilylaw.co.uk/contact/" itemprop="url"><i class="fa fa-phone"></i> Contact</a>




<a href="http://www.stowefamilylaw.co.uk/">Stowe Family Law</a>
<a href="#spu-5086" style="color: #fff"><div class="callbackbutton"><i class="fa fa-phone" style="font-size: 16px"></i> Request Callback </div></a>
<a href="#spu-5084" style="color: #fff"><div class="callbackbutton"><i class="fa fa-envelope-o" style="font-size: 16px"></i> Quick Enquiry </div></a>
<a class="ubermenu-responsive-toggle ubermenu-responsive-toggle-main ubermenu-skin-black-white-2 ubermenu-loc-primary" data-ubermenu-target="ubermenu-main-3-primary"><i class="fa fa-bars"></i>Main Menu</a>

Answer 1

你有一条毯子except:：

try:
    # ...
except:
    continue

所以块中的任何错误都会被屏蔽，你的循环将被跳过。除了处理程序之外不要再使用毯子而不再提高，请参阅Why is "except: pass" a bad programming practice?。至少只捕获Exception和打印该错误：

except Exception as e:
    print 'Encountered:', e

如果没有适当的诊断，我们所能做的就是猜测。

如果没有href属性，您肯定会遇到一个属性错误; None对象没有属性startswith：

if not tag.get('href', None).startswith('..'):

而不是None返回一个空字符串：

if not tag.get('href', '').startswith('..'):

或更好的是，只选择a属性为href的标记：

tags = soup.select('a[href]')

我如何从相同的Python打印命令获得两个不同的结果？

1 个答案: