访问页面上的隐藏数据

时间:2017-09-18 11:37:04

标签: python web-scraping beautifulsoup

  1. 我需要访问以下网站:http://mothoq.com/store/22
  2. 向下滚动,直到我看到电话图标。
  3. 点击它,然后刮掉电话号码。
  4. 我已成功连接到该网站,并且能够删除所需的所有数据,但电话号码除外。

    我尝试过使用

    soup.find_all('p',attrs={"align":"center"})
    

    我的代码是:

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    records = []
    storeId = 22
    url = "http://mothoq.com/store/" + str(storeId)
    r = requests.get(url)
    content = r.text
    soup = BeautifulSoup(content, "html5lib")
    results = soup.find('div', attrs={'id': 'subtitle'})
    
    for storeData in results:
    storeName = soup.find('h1')
    url = soup.find('font').text
    
    contacts = soup.find_all('p', attrs={"class":"store_connect_details"})
    for storeContact in contacts:
        storePhone    = soup.find_all('p', attrs={"align":"center"})
        storeTwitter  = soup.find('a', attrs={"class":"connect_icon_twitter"})['href']
        storeFacebook = soup.find('a', attrs={"class":"connect_icon_facebook"})['href']
        storeLinkedin = soup.find('a', attrs={"class":"connect_icon_linkedin"})['href']
    
    print(storePhone)
    

    谢谢!

1 个答案:

答案 0 :(得分:0)

您应该使用div搜索隐藏的id="store-telephone-form"并获取第二个 <p>标记。

import requests
import pandas as pd
from bs4 import BeautifulSoup


records = []
storeId = 22
url = "http://mothoq.com/store/" + str(storeId)
r = requests.get(url)
content = r.text
soup = BeautifulSoup(content, "lxml")
results = soup.find('div', attrs={'id': 'subtitle'})

storeName = soup.find('h1')
url = soup.find('font').text

contacts = soup.find_all('p', attrs={"class":"store_connect_details"})

try:
    storePhone = soup.find('div', attrs={"id":"store-telephone-form"}).select('p')[1].text
    storeTwitter  = soup.find('a', attrs={"class":"connect_icon_twitter"}).get('href')
    storeFacebook = soup.find('a', attrs={"class":"connect_icon_facebook"}).get('href')
    storeLinkedin = soup.find('a', attrs={"class":"connect_icon_linkedin"}).get('href')
except:
    pass

print(storePhone)