我正在尝试使用webscraper
和python
对我的第一个BeautifulSoup
进行编码。
我正在尝试检索网页上所有列表的所有URL,但是我没有获得包含所有URL的数组,而是仅获得一个URL。
以下是我使用的代码
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.pararius.com/apartments/enschede'
uClient = uReq(my_url)
page_html=uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
compartments = page_soup.findAll("li",{"class":"property-list-item-container"})
#Here is where im trying to store all the urls in url_det
for compartment in compartments:
url_det = compartment.h2.a["href"]
感谢任何输入!
答案 0 :(得分:1)
循环的每次迭代都将覆盖url_det
的内容,而是使用列表推导将所有值存储在列表中,例如:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.pararius.com/apartments/enschede'
uClient = uReq(my_url)
page_html=uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
compartments = page_soup.findAll("li",{"class":"property-list-item-container"})
url_det = [compartment.h2.a["href"] for compartment in compartments]
print(url_det)
>>> ['/house-for-rent/enschede/PR0001596564/otto-van-taverenstraat', ... , '/house-for-rent/enschede/PR0001594320/hanenberglanden']