我是python剪贴的新手,并且想要编写网站的代码剪贴数据,并且当没有可用的分页并且页面链接是动态的时,它都是内部页面,您可以看到我张贴在我要尝试的位置的链接收集公司的信息,名称,地址和电话号码
这是我的代码。
我已经尝试了很多来自stackoverflow的问题,但它们不符合我的要求。
from bs4 import BeautifulSoup
import requests
source= requests.get('http://businessdirectory.pk/Default.aspx?action=Business&pid=762390').text
soup= BeautifulSoup(source, 'lxml')
ParentDiv= soup.find('div' , class_='businessDetails')
CompanyName= ParentDiv.find('p' , class_='title').text
CityName= ParentDiv.find('p' , class_='cityName').text
CityAddress= ParentDiv.find('p' , class_='address').text
PhoneNumber= ParentDiv.find('p' , class_='phone').text
MobileNo= ParentDiv.find('p' , class_='mobNo').text
print(CompanyName)
print(CityName)
print(CityAddress)
print(PhoneNumber)
我只想提供一个域链接,它将获得所有内页并在其中搜索相同的数据。
答案 0 :(得分:0)
尝试以下代码,希望对您有所帮助。
from bs4 import BeautifulSoup
import requests
page_num = 0
company_name=[]
City_Name=[]
City_Address=[]
Phone_Number=[]
Maxpage=12
while page_num<Maxpage:
page = "http://businessdirectory.pk/Default.aspx?action=Business&pid=762390&page={}".format(page_num)
pageTree = requests.get(page)
soup = BeautifulSoup(pageTree.text, 'html.parser')
ParentDiv = soup.find('div', class_='businessDetails')
CompanyName = ParentDiv.find('p', class_='title').text
CityName = ParentDiv.find('p', class_='cityName').text
CityAddress = ParentDiv.find('p', class_='address').text
PhoneNumber = ParentDiv.find('p', class_='phone').text
company_name.append(CompanyName)
City_Name.append(CityName)
City_Address.append(CityAddress)
Phone_Number.append(PhoneNumber)
page_num += 1
print(company_name)
print(City_Name)
print(City_Address)
print(Phone_Number)
输出将是这样。
['Ab Traders', 'Al Faisal Machinery Store', 'Ameen Pipe Store', 'Aslam Air Compressor', 'Best Engineering Works', 'China Center', 'Empyrean Group', 'General Industrial Corporation', 'Habib Mill Store', 'Humayun Traders', 'Islam Air Corporation', 'Khalid Hussain Workshop 3']
['Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Lahore', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad']
['Sadiq Market, Railway Road', 'Railway Road', 'Sadiq Market, Railway Road', 'Railway Road', 'Sadiq Market, Railway Road', 'Railway Road', '8-E 1, Jagawar Chowk, Near Allah Hu Chowk, Johar Town', 'Sadiq Market, Railway Road', 'Railway Road', 'Sadiq Market, Railway Road', 'Railway Road', 'General Bus Stand']
['0412639166', '0412646985-2606985', '0412618759', '0412600387', '0412632037', '0412600504-2634502', '0336-9954475', '0412636174-2637446', '0412617274', '0412635348-2617469', '0412618242', '0418781513']