没有获取链接以及文本数据,没有获取正在发生的事情,我使用python 3 beautifulsoup

时间:2019-01-07 12:13:54

标签: python beautifulsoup

我没有得到链接以及文本数据,没有得到正在发生的事情,

我使用python 3 beautifulsoup

from bs4 import BeautifulSoup
import requests
headers = {"User-Agents":"googleBoat"}
r = requests.get('https://www.asklaila.com/search/Delhi-NCR/industrial-area-phase-1/manufacturers/',headers=headers)
soup = BeautifulSoup(r.text,'lxml')

##link of each company
for links in soup.find_all('h2',class_='resultTitle'):
    link = links.find('a')
    print(link['href'])

##data of each company
name = soup.find('h1',class_='cardHeadTitle')
print(name)
nature = soup.find('h1',class_='cardHeadSubTitle')
print(nature)

data = soup.find('div',{"id":"ldpAdrsDetails"})
for phone in data.find_all('span',class_='tel')[0]:
    print(phone)
for mob in data.find_all('span',class_='tel')[1]:
    print(mob)
for address in data.find_all('span',class_='adr'):
    print(address)
for landmark in data.find_all('i',class_='glyphicon glyphicon-tower'):
    print(landmark)
for products in data.find_all('span',class_='cardElementLinks'):
    print(products)

2 个答案:

答案 0 :(得分:0)

您可以使用Selenium打开浏览器并获取数据。我并没有获得一切,但是很快就抓住了姓名,链接,电话...然后这应该使您可以继续获取所需的其他信息。

import bs4 
from selenium import webdriver 

url = 'https://www.asklaila.com/search/Delhi-NCR/industrial-area-phase-1/manufacturers/'

browser = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
browser.get(url)

html = browser.page_source
soup = bs4.BeautifulSoup(html,'html.parser')  

cards = soup.find_all('div',class_='col-xs-12 card')

for card in cards:

    link = card.find('h2',class_='resultTitle')

    href = link.find('a')['href']
    name = link.text.strip()
    nature = card.find('span',class_='resultSubTitle').text.strip()

    try:
        phone = card.find('label', {'class':"phonedisplay"}).text.strip()
        try:
            phone = phone.split(',')
            phone1 = phone[0].strip()
            mobile = phone[1].strip()
        except:
            phone1 = phone.strip()
            mobile = ''
    except:
        phone1 = ''
        mobile = ''

    print ('Name: '+name)
    print ('Link: '+href)
    print ('Phone: '+phone1)
    print ('Mobile: '+mobile+'\n')


browser.close()

输出:

Name: Hitech Packers
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/hitech-packers/qy3r8ZG2/
Phone: 01126371381
Mobile: 09810235750

Name: Sharma Sanitary Goods Manufacturers
Link: https://www.asklaila.com/listing/Delhi-NCR/naraina-industrial-area-phase-1/sharma-sanitary-goods-manufacturers/1vdsLtBO/
Phone: 
Mobile: 

Name: Jyoti Apparels
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/jyoti-apparels/174NGGZ8/
Phone: 
Mobile: 

Name: Modern Tools Manufacturers
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/modern-tools-manufacturers/0ockglp2/
Phone: 
Mobile: 

Name: Karan Motors Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/karan-motors-private-limited/PW6GbMyj/
Phone: 01128117292
Mobile: 09311026538

Name: Seth Brothers Perfumers Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/naraina-industrial-area-phase-1/seth-brothers-perfumers-private-limited/b13LLUpy/
Phone: 
Mobile: 

Name: MYK Laticrete India Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/myk-laticrete-india-private-limited/1qsYneHK/
Phone: 07941407461
Mobile: 09350621093

Name: Hindustan Switch Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/hindustan-switch-private-limited/0eCf6VL3/
Phone: 
Mobile: 

Name: Leo Industries
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/leo-industries/17GFTPQf/
Phone: 01141833375
Mobile: 09873575646

Name: Benny Impex Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/naraina-industrial-area-phase-1/benny-impex-private-limited/PTS4ClCb/
Phone: 
Mobile: 

Name: Atul Aluminium
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/atul-aluminium/0WSE64VC/
Phone: 
Mobile: 

Name: Baldev Metals Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/baldev-metals-private-limited/0e036Prf/
Phone: 01128117423
Mobile: 09810058658

Name: FUCEN STI Apparel Automation Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/fucen-sti-apparel-automation-private-limited/0YtpzttB/
Phone: 01141076130
Mobile: 09310601501

Name: Premier Bags
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/premier-bags/1t4vjzYD/
Phone: 01125265798
Mobile: 09811491300

Name: Kandhari Brothers Private Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/kandhari-brothers-private-limited/0fjgIaIY/
Phone: 01128116511
Mobile: 09811060054

Name: Neelkanth Stainless Steel Sinks
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/neelkanth-stainless-steel-sinks/EvP1wxfR/
Phone: 01128116021
Mobile: 09818722322

Name: Dhawan Enterprises
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/dhawan-enterprises/1ejpdIzC/
Phone: 
Mobile: 

Name: Eagle Flask Industries Limited
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/eagle-flask-industries-limited/07bhTSXB/
Phone: 01141610691
Mobile: 09891505048

Name: Krishna Foundry And Workshop
Link: https://www.asklaila.com/listing/Delhi-NCR/mayapuri-industrial-area-phase-1/krishna-foundry-and-workshop/0FfQR4GQ/
Phone: 01128115143
Mobile: 09810044646

Name: WH Deeth Ballabgarh And Company
Link: https://www.asklaila.com/listing/Delhi-NCR/okhla-industrial-area-phase-1/wh-deeth-ballabgarh-and-company/4HKZfxBT/
Phone: 
Mobile: 

答案 1 :(得分:0)

Its getting 403 Forbidden because it set wrong headers, it User-Agent without the s, also you have several wrong selectors

headers = {"User-Agent":"Mozilla/5.0"}
r = requests.get('https://www.......', headers=headers)
soup = BeautifulSoup(r.text,'html.parser')

for card in soup.find_all('div', class_='col-md-6 col-lg-6 cardWrap'):
    ##data of each company
    name = card.find('h2', class_='resultTitle')
    if not name:
        continue
    nature = card.find('span', class_='resultSubTitle')
    phone = card.find('label', class_='phonedisplay')
    phone = re.sub(r'\s+,\s+', ', ', phone.text.strip()) if phone else "no phone"
    address = card.find('img', attrs={"title" : "Address"})
    products = card.find('div', class_='bottomSpaceMargin')
    link = card.find('a')
    company = '{} \n{} \n{} \n{} \n{} \n{}'.format(
        name.text.strip(),
        nature.text.strip(),
        phone,
        address.parent.text.strip(),
        products.text.strip(),
        link['href']
    )
    print(company)
    print('==========================')