Python - 根据列表检查列表

时间:2017-07-21 19:18:23

标签: python list for-loop if-statement bs4

我正在编写一个脚本,它将抓取一个页面并找到可采用的狗的名字。我能够将名称删除并附加到列表中。但是,我无法连续运行代码并将新名称附加到新列表并将其从旧列表中删除。想知道是否有人可以帮我解决这个问题。

import requests
from bs4 import BeautifulSoup
import re
import time
from twilio.rest import Client

url = 'http://petharbor.com/results.asp?searchtype=ADOPT&start=3%20&friends=1&samaritans=1&nosuccess=0&rows=25&imght=200&imgres=thumb&tWidth=200&view=sysadm.v_chmp&bgcolor=b7b7b7&text=ffffff&link=ffffff&alink=4400ff&vlink=ffffff&fontface=arial&fontsize=12&col_hdr_bg=000066&col_hdr_fg=ffffff&SBG=000066&zip=61802&miles=10&shelterlist=%27CHMP%27&atype=&where=type_DOG&PAGE=1'
response = requests.get(url)
html = response.content

account_sid = ("XXXXXXXXXXXXXXXXXXXXXXXXXXX")
auth_token = ("XXXXXXXXXXXXXXXXXXXXXXXXXXXX")
client = Client(account_sid, auth_token)
soup = BeautifulSoup(html, 'html.parser')

names = soup.find_all(text=re.compile("My name is(.*)"))

def check():
    old = []
    new = []
    newest = []

    for name in names:
        name = name.title()
        if name not in old:
            old.append(name[11:-2])
            if name in old:
                continue

    for name in names:
        name = name.title()
        if name in old:
            continue
        if name not in new and name not in old:
            new.append(name[11:-2])
        if name not in new and name in old:
            new.append(name[11:-2])
            old.remove(name)
        if name in new and name in old:
            old.remove(name)
            new.remove(name)

    for name in names:
        name = name.title()
        if name in old or name in new:
            continue
        if name not in old and name not in new:
            newest.append(name[11:-2])

    num_old = len(old)
    num_new = len(new)
    num_newest = len(newest)

    print("Old List: " + str(old))
    print("Number of dogs in the old list: " + str(num_old))
    print("New List: " + str(new))
    print("Number of new dogs: " + str(num_new))
    print("Newest List: " + str(newest))
    print("Number of newest dogs: " + str(num_newest))

    #client.api.account.messages.create(to = "+XXXXXXXXXX",
                                        #from_= "+XXXXXXXXXX",
                                        #body = "Here are some new dogs:" + str(new))

    #client.api.account.messages.create(to="+XXXXXXXXXX",
                                       #from_="+XXXXXXXXXX",
                                       #body=("There are " + str(num_newest) + " new puppies"), media_url = 'http://petharbor.com/results.asp?searchtype=ADOPT&start=3%20&friends=1&samaritans=1&nosuccess=0&rows=25&imght=200&imgres=thumb&tWidth=200&view=sysadm.v_chmp&bgcolor=b7b7b7&text=ffffff&link=ffffff&alink=4400ff&vlink=ffffff&fontface=arial&fontsize=12&col_hdr_bg=000066&col_hdr_fg=ffffff&SBG=000066&zip=61802&miles=10&shelterlist=%27CHMP%27&atype=&where=type_DOG&PAGE=1')

    #client.api.account.messages.create(to = "+XXXXXXXXXX",
                                        #from_= "+XXXXXXXXXX",
                                        #body = "Here are some new names:" + str(newest))


while True:
    check()
    time.sleep(20)

这是当前的输出:

Old List: ['Pretty', 'Celia', 'Khloe', 'Duke', 'Evangeline', 'Thelma', 'Clara', 'Carly', 'Camille', 'Maxine', 'Jupiter', 'Pixie', 'Smiley', 'Mia', 'Pogo', 'Rosco', 'Clark', 'Ellie', 'Marcy', 'Jimmy', 'Willie', 'Layla']
Number of dogs in the old list: 22
New List: ['Pretty', 'Celia', 'Khloe', 'Duke', 'Evangeline', 'Thelma', 'Clara', 'Carly', 'Camille', 'Maxine', 'Jupiter', 'Pixie', 'Smiley', 'Mia', 'Pogo', 'Rosco', 'Clark', 'Ellie', 'Marcy', 'Jimmy', 'Willie', 'Layla']
Number of new dogs: 22

我试图让它更新为:

Old List: ['Pretty', 'Celia', 'Khloe', 'Duke', 'Evangeline', 'Thelma', 'Clara', 'Carly', 'Camille', 'Maxine', 'Jupiter', 'Pixie', 'Smiley', 'Mia', 'Pogo', 'Rosco', 'Clark', 'Ellie', 'Marcy', 'Jimmy', 'Willie', 'Layla']
Number of dogs in the old list: 22
New List: ['Max', 'Charlie']
Number of new dogs: 2

我会尝试重述我想要做的事情:

  1. 运行脚本从网站上抓取名称并存储在列表中(' old')。
  2. 重新检查网站并收集姓名
  3. 如果名称重复,请不要将它们附加到列表中。
  4. 如果名称不重复,请附加到列表中(' new')
  5. 继续运行并将名称移至旧列表,如果有新的非重复名称,则从新列表中删除。

0 个答案:

没有答案