第一次在这里发布海报,所以请温柔!我是Python的新手,使用以下代码抓取多个URL时遇到一些麻烦:
from urllib import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = ["https://www.zoopla.co.uk/for-sale/property/birmingham/?q=birmingham&results_sort=newest_listings&search_source=home&page_size=100", "https://www.zoopla.co.uk/for-sale/property/birmingham/?identifier=birmingham&page_size=100&q=birmingham&search_source=home&radius=0&pn=2"]
for urls in my_url:
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
containers = page_soup.findAll("div",{"class":"listing-results-wrapper"})
filename = "links.csv"
f = open (filename, "w")
headers = "link\n"
f.write(headers)
for container in containers:
link = container.div.div.a["href"]
print("link: " + link)
f.write(link + "\n")
f.close()
我猜我犯的是一个非常基本的错误,但我似乎无法通过搜索论坛/谷歌等找到任何东西,因为我必须在错误的地方寻找。
编辑:我发现我最好解释一下我想要实现的目标!我正在尝试创建一个单独的csv文件,其中包含变量'容器所抓取的信息。
此代码似乎只适用于1个网址,但我收到了AttributeError:' list对象没有属性' strip'添加其他网址时。
有人愿意提供一些帮助吗?
非常感谢任何帮助!
答案 0 :(得分:0)
代码搞砸了,但是正在调用列表。
from urllib import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = ["https://www.zoopla.co.uk/for-sale/property/birmingham/?q=birmingham&results_sort=newest_listings&search_source=home&page_size=100", "https://www.zoopla.co.uk/for-sale/property/birmingham/?identifier=birmingham&page_size=100&q=birmingham&search_source=home&radius=0&pn=2"]
for urls in my_url:
uClient = uReq(urls)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
containers = page_soup.findAll("div",{"class":"listing-results-wrapper"})
filename = "links.csv"
f = open (filename, "w")
headers = "link\n"
f.write(headers)
for container in containers:
link = container.div.div.a["href"]
print("link: " + link)
f.write(link + "\n")
f.close()