您好我是Python的新手,我想弄清楚为什么每次加载新页面并在while循环期间抓取时,我的列表会覆盖以前的元素。先感谢您。
def scrapeurls():
domain = "https://domain234dd.com"
count = 0
while count < 10:
page = requests.get("{}{}".format(domain, count))
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.findAll('div', attrs={'class': 'video'})
urls = []
for div in data:
links = div.findAll('a')
for a in links:
urls.append(a['href'])
print(a['href'])
print(count)
count += 1
答案 0 :(得分:3)
因为您在循环的每次迭代中将urls
重置为空列表。你应该把它移到循环之前。
(注意,整个事情会更好地表达为for循环。)
答案 1 :(得分:3)
您需要在循环之前初始化URL列表。如果在循环内部初始化,则每次都将其设置为空。
答案 2 :(得分:1)
domain = "https://domain234dd.com"
count = 0
urls = []
while count < 10:
page = requests.get("{}{}".format(domain, count))
soup = BeautifulSoup(page.content, 'html.parser')
data = soup.findAll('div', attrs={'class': 'video'})
for div in data:
links = div.findAll('a')
for a in links:
urls.append(a['href'])
print(a['href'])
print(count)
count += 1