Python Continue语句如果不包含gaq标记

时间:2015-03-08 12:10:47

标签: python

我正在练习一个小蟒蛇,我试图从p标签中提取内容。但是当我抓住内容时,会打印出gaq标签。 我想打印所有不包含gaq标签的p标签,所以我使用continue循环创建了以下脚本。 如果_gaq.push在

中,请不要打印并继续循环。

import mechanize
from bs4 import BeautifulSoup

url = 'http://almondoilzone.com'

browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent','Mozilla')]

htmltext = browser.open(url)
soup = BeautifulSoup(htmltext)

# HERE THE CONTINUE LOOP STILL PRINTS THE PARAGRAPH WITH _gaq.push

for post in soup.findAll('div',{"class","post"}):
    for paragraph in post.findAll('p'):
        if paragraph.find("_gaq.push")== -1:
            continue
        print paragraph.text

1 个答案:

答案 0 :(得分:0)

由于paragraph是BeautifulSoup元素,paragraph.find(...)是BeautifulSoup的find()方法,而不是Python的string.find()

你可能想要一个循环:

for post in soup.findAll('div',{"class","post"}):
    for paragraph in post.findAll('p'):
        if  "_gaq.push" in paragraph.text:    # <---------
            continue
        print paragraph.text