因此,我正在通过编写此小脚本来测试有关网络抓取的知识(我知道有些事情是多余的,只是我在玩弄事情):
def programstart():
import re
from selenium import webdriver
from bs4 import BeautifulSoup
ff=webdriver.Firefox()
ff.get('https://www.ozbargain.com.au/deals')
linkpattern = r'https://www.ozbargain.com.au/node/.*'
donotpassthislink = 'https://www.ozbargain.com.au/node/415763'
#list of found lnks
titlelist = []
#loop to find links
while True:
soup = BeautifulSoup(ff.page_source)
raw_links = soup.find_all(name='a', href=re.compile(linkpattern))
for links in raw_links:
titles = ff.find_element_by_xpath("//h2[@class='title']").text
extract = links
if donotpassthislink == extract:
break
else:
pass
titlelist.append(titles)
# Load next page
if donotpassthislink == extract: # stop loop at first movie link from last time.
break
else:
ff.get(ff.find_element_by_xpath("//a[@class='pager-next active']").get_attribute('href'))
programstart()
我收到此错误:
Traceback (most recent call last):
File "F:/Dropbox/Funfile/ben.py", line 35, in <module>
programstart()
File "F:/Dropbox/Funfile/ben.py", line 29, in programstart
if donotpassthislink == extract: # stop loop at first movie link from last time.
UnboundLocalError: local variable 'extract' referenced before assignment
我发现这很奇怪,因为此循环已在另一个运行良好的脚本中使用。
所以我写了一个简化的脚本:
test = 'lets break this sentence up'
list = []
def func():
while True:
for i in test:
extract = i
print(extract)
if extract == 'u':
break
else:
pass
list.append(extract)
if extract == 'u':
print('done')
break
func()
它执行没有任何问题。 发生了什么事?