我是网络抓取和练习的新手我正在尝试网络抓取网站并将结果转换为csv文件。当我来到部件将结果转换为csv文件时,它不会将地址放在地址列中。我希望数据进入地址列。代码如下。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.allagents.co.uk/find-agent/london/'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
containers = page_soup.findAll('div', {'class':'itemlabel3'})
filename = "webscrape.csv"
f = open(filename, "w")
headers = "Company Name, Address, Telephone Number\n"
f.write(headers)
for container in containers:
comp_name = container.find('div', {'class':'labelleft2 col-md-
10'}).div.h4.a.text
address = container.find('div', {'class':'labelleft2 col-md-
10'}).div.p.text
tel = container.find('div', {'class':'labelleft2 col-md-
10'}).div.find('p', {'style':'clear: both; margin-bottom:
15px;'}).strong.text
print("Company Name:", comp_name)
print("Address:", address)
print("Telephone", tel)
f.write(comp_name.replace(",", ("|")) + "," + address.replace(",", ("|")) +
"," + tel + "\n")
f.close()
感谢任何帮助。提前谢谢你。
答案 0 :(得分:2)
似乎在您的地址数据中存在新行字符
尝试在代码中替换以下行代码并尝试重新运行
address=(container.find('div', {'class':'labelleft2 col-md-10'}).div.p.text).replace('\n','')