我在程序中将used = []
定义为全局var
。现在,我在jimoti
中有一个函数while Loop
。在函数内部,我遍历了web抓取(bs4)的结果,并将title
的web-scarp添加到used
列表中。当title
中存在used
时,我试图不再次打印它,但是一次又一次打印,因为正则表达式在两个或三个关键字上匹配它,所以我的文字相同2 ,打印3次。我该如何更改代码,使其仅打印一次?
这是代码
from bs4 import BeautifulSoup
import requests
from time import sleep
from random import randint
import re
import os
allowed = ["pc", "FUJITSU", "LIFEBOOK", "win" "Windows",
"PC", "Linux" "linux", "HP", "hp", "notebook", "desktop",
"raspberry", "NEC", "mac", "Mac", "Core"]
denied = ["philips"]
used = set()
source = requests.get("https://jmty.jp/aichi/sale-pcp").text
soup = BeautifulSoup(source, 'lxml')
def jimoti(sk):
global used
for h2 in soup.find_all('div', class_='p-item-content-info'):
title = h2.select_one('.p-item-title').text
address = h2.select_one('.p-item-title').a["href"]
price = (h2.select_one('.p-item-most-important').text).replace("円", "").replace("\n", "").replace(",", "")
price = int(price)
town = h2.select_one('.p-item-supplementary-info').text
if price < 5000000 and title not in used:
used.add(title)
for pattern in allowed:
print(pattern)
if re.search(pattern, title):
second(sk, title, address, price, town)
break
def second(sk, title, address, price, town):
sk = sk
title = title
address = address
price = price
town = town
for prh in denied:
print(prh)
if re.search(prh, title):
break
else:
send(sk, title, address, price, town)
if __name__ == '__main__':
while True:
jimoti(sk)
sleep(randint(11,20))
答案 0 :(得分:0)
最初的问题是关于在设置条件时一次循环打印元素,然后再打印一次-我们避免break
在第一次点击后退出循环。
seen = set()
for i in range(10):
if i not in seen:
for x in range(10):
seen.add(i)
break
内部循环中间有逻辑-
for prh in denied:
print(prh)
if re.search(prh, title):
break
else:
send(sk, title, address, price, town)
按照书面形式,它将在prh
中寻找title
,直到找到为止,然后中断,因此将为prh
中不在{{ 1}}。那可能不是您要表达的逻辑-title
send unless any of the prh values is in title?
上一个级别,您对“允许”的逻辑基本相同。我的猜测是正确的逻辑是-
if all([prh not in title for prh in denied]):
send(sk, title, address, price, town)
我也不确定为什么你在那里睡得乱七八糟,似乎是破坏性的。不确定stackoverflow是您可能需要的支持级别的最佳论坛-对于python初学者有一个Reddit可能会更有用。