对于循环和If语句在Web抓取中没有正确循环

时间:2016-12-22 03:10:27

标签: python for-loop web-scraping beautifulsoup lxml

这应该抓住给定的网站,直到链接上传到网站,然后通过电子邮件向我发送信息,如果我输入网站上已有链接的关键字,但如果我给出的关键字不是'',那么一切正常在网站上,随机链接通过电子邮件发送给我。如何循环这个以便脚本擦除,直到找到与给定的3个关键字的链接 - 然后继续执行脚本的其余部分..请以任何方式循环这个你可以想象! (我已经省略了电子邮件信息)

from bs4 import BeautifulSoup
import requests
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import urllib2
import time
from lxml import etree
while True:
    keyword1 = "spam"
    keyword2 = "notonwebsite"
    keyword3 = "stackoverflow"
    print("starting")
    r = requests.get('http://kithnyc.com/sitemap_products_1.xml?from=60594372&to=9545825095')
    soup = BeautifulSoup(r.text, 'lxml')
    links = soup.find_all('loc')
    for link in links:
        if keyword1 in link.text and keyword2 in link.text and keyword3 in link.text:
            logic = True
        if logic == True:
            continue

    print(link.text)
    jake = str(link.text)

1 个答案:

答案 0 :(得分:0)

下面。我改了它,所以代码更清晰。我认为它应该有用。

private static String getPreviousMonthDate(Date date){
    final SimpleDateFormat format = new SimpleDateFormat("dd-MM-yyyy");

    Calendar cal = Calendar.getInstance();  
    cal.setTime(date);  
    cal.set(Calendar.DAY_OF_MONTH, 1);  
    cal.add(Calendar.DATE, -1);

    Date preMonthDate = cal.getTime();  
    return format.format(preMonthDate);
}


private static String getPreToPreMonthDate(Date date){
    final SimpleDateFormat format = new SimpleDateFormat("dd-MM-yyyy");

    Calendar cal = Calendar.getInstance();  
    cal.setTime(date);  
    cal.add(Calendar.MONTH, -1);  
    cal.set(Calendar.DAY_OF_MONTH,1);  
    cal.add(Calendar.DATE, -1);  

    Date preToPreMonthDate = cal.getTime();  
    return format.format(preToPreMonthDate);
}