我没有得到链接以及文本数据,没有得到正在发生的事情,
我使用python 3 beautifulsoup
from bs4 import BeautifulSoup
import requests
headers = {"User-Agents":"googleBoat"}
r = requests.get('https://www.asklaila.com/search/Delhi-NCR/industrial-area-phase-1/manufacturers/',headers=headers)
soup = BeautifulSoup(r.text,'lxml')
##link of each company
for links in soup.find_all('h2',class_='resultTitle'):
link = links.find('a')
print(link['href'])
##data of each company
name = soup.find('h1',class_='cardHeadTitle')
print(name)
nature = soup.find('h1',class_='cardHeadSubTitle')
print(nature)
data = soup.find('div',{"id":"ldpAdrsDetails"})
for phone in data.find_all('span',class_='tel')[0]:
print(phone)
for mob in data.find_all('span',class_='tel')[1]:
print(mob)
for address in data.find_all('span',class_='adr'):
print(address)
for landmark in data.find_all('i',class_='glyphicon glyphicon-tower'):
print(landmark)
for products in data.find_all('span',class_='cardElementLinks'):
print(products)
答案 0 :(得分:0)
您可以使用Selenium打开浏览器并获取数据。我并没有获得一切,但是很快就抓住了姓名,链接,电话...然后这应该使您可以继续获取所需的其他信息。
import bs4
from selenium import webdriver
url = 'https://www.asklaila.com/search/Delhi-NCR/industrial-area-phase-1/manufacturers/'
browser = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
browser.get(url)
html = browser.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
cards = soup.find_all('div',class_='col-xs-12 card')
for card in cards:
link = card.find('h2',class_='resultTitle')
href = link.find('a')['href']
name = link.text.strip()
nature = card.find('span',class_='resultSubTitle').text.strip()
try:
phone = card.find('label', {'class':"phonedisplay"}).text.strip()
try:
phone = phone.split(',')
phone1 = phone[0].strip()
mobile = phone[1].strip()
except:
phone1 = phone.strip()
mobile = ''
except:
phone1 = ''
mobile = ''
print ('Name: '+name)
print ('Link: '+href)
print ('Phone: '+phone1)
print ('Mobile: '+mobile+'\n')
browser.close()
输出:
Name: Hitech Packers
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/hitech-packers/qy3r8ZG2/
Phone: 01126371381
Mobile: 09810235750
Name: Sharma Sanitary Goods Manufacturers
Link: https://www.asklaila.com/listing/Delhi-NCR/naraina-industrial-area-phase-1/sharma-sanitary-goods-manufacturers/1vdsLtBO/
Phone:
Mobile:
Name: Jyoti Apparels
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/jyoti-apparels/174NGGZ8/
Phone:
Mobile:
Name: Modern Tools Manufacturers
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/modern-tools-manufacturers/0ockglp2/
Phone:
Mobile:
Name: Karan Motors Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/karan-motors-private-limited/PW6GbMyj/
Phone: 01128117292
Mobile: 09311026538
Name: Seth Brothers Perfumers Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/naraina-industrial-area-phase-1/seth-brothers-perfumers-private-limited/b13LLUpy/
Phone:
Mobile:
Name: MYK Laticrete India Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/myk-laticrete-india-private-limited/1qsYneHK/
Phone: 07941407461
Mobile: 09350621093
Name: Hindustan Switch Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/hindustan-switch-private-limited/0eCf6VL3/
Phone:
Mobile:
Name: Leo Industries
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/leo-industries/17GFTPQf/
Phone: 01141833375
Mobile: 09873575646
Name: Benny Impex Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/naraina-industrial-area-phase-1/benny-impex-private-limited/PTS4ClCb/
Phone:
Mobile:
Name: Atul Aluminium
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/atul-aluminium/0WSE64VC/
Phone:
Mobile:
Name: Baldev Metals Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/baldev-metals-private-limited/0e036Prf/
Phone: 01128117423
Mobile: 09810058658
Name: FUCEN STI Apparel Automation Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/fucen-sti-apparel-automation-private-limited/0YtpzttB/
Phone: 01141076130
Mobile: 09310601501
Name: Premier Bags
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/premier-bags/1t4vjzYD/
Phone: 01125265798
Mobile: 09811491300
Name: Kandhari Brothers Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/kandhari-brothers-private-limited/0fjgIaIY/
Phone: 01128116511
Mobile: 09811060054
Name: Neelkanth Stainless Steel Sinks
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/neelkanth-stainless-steel-sinks/EvP1wxfR/
Phone: 01128116021
Mobile: 09818722322
Name: Dhawan Enterprises
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/dhawan-enterprises/1ejpdIzC/
Phone:
Mobile:
Name: Eagle Flask Industries Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/eagle-flask-industries-limited/07bhTSXB/
Phone: 01141610691
Mobile: 09891505048
Name: Krishna Foundry And Workshop
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/krishna-foundry-and-workshop/0FfQR4GQ/
Phone: 01128115143
Mobile: 09810044646
Name: WH Deeth Ballabgarh And Company
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/wh-deeth-ballabgarh-and-company/4HKZfxBT/
Phone:
Mobile:
答案 1 :(得分:0)
Its getting 403 Forbidden
because it set wrong headers, it User-Agent
without the s
, also you have several wrong selectors
headers = {"User-Agent":"Mozilla/5.0"}
r = requests.get('https://www.......', headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
for card in soup.find_all('div', class_='col-md-6 col-lg-6 cardWrap'):
##data of each company
name = card.find('h2', class_='resultTitle')
if not name:
continue
nature = card.find('span', class_='resultSubTitle')
phone = card.find('label', class_='phonedisplay')
phone = re.sub(r'\s+,\s+', ', ', phone.text.strip()) if phone else "no phone"
address = card.find('img', attrs={"title" : "Address"})
products = card.find('div', class_='bottomSpaceMargin')
link = card.find('a')
company = '{} \n{} \n{} \n{} \n{} \n{}'.format(
name.text.strip(),
nature.text.strip(),
phone,
address.parent.text.strip(),
products.text.strip(),
link['href']
)
print(company)
print('==========================')