我如何从网站上抓取隐藏数据

时间:2020-03-07 22:13:07

标签: python web-scraping beautifulsoup

我想从这个网站上抓取数据。 https://www.myconstant.com/pro-lending。我正在与beautifulsoup尝试,但无法访问此网站数据。请在这里的任何人都可以帮助我。我只想从此网站访问投资订单类别。

这是我的代码示例

import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.myconstant.com/pro-lending'
cookies = dict(cookie='OTZ=5322074_36_36__36_; CONSENT=YES+PK.en+202002; ANID=AHWqTUlkHkcsWxQOa8bj1HPw61uI1ASv41AZ-C2dcJszhllBcVsFoL-LRmQURs7t; OGPC=19016257-1:; SID=ugdIMfWxvjc2Zrz3TCDKjthu5lDnFoFH7QJ9zv5qaIM83RE9d1siIWqXAxi2Fbi7EYrlqA.; __Secure-3PSID=ugdIMfWxvjc2Zrz3TCDKjthu5lDnFoFH7QJ9zv5qaIM83RE9yClGzaYUGZtRSrUprQBH_g.; HSID=Ad0Mhd9c6QzutsaZC; SSID=Au6GMpM4y0DzAZYaB; APISID=Xdqm2aWUwlDspAy1/A98sORceYqZRYt41u; SAPISID=TmATibzalihSo7VH/A0VsoKWSycbne7-xj; __Secure-HSID=Ad0Mhd9c6QzutsaZC; __Secure-SSID=Au6GMpM4y0DzAZYaB; __Secure-APISID=Xdqm2aWUwlDspAy1/A98sORceYqZRYt41u; __Secure-3PAPISID=TmATibzalihSo7VH/A0VsoKWSycbne7-xj; NID=199=v7-O74g7gg1mrTP9c7Jj52S6f7pCpyv5iO_W6ggU_DP2gRyUI6u7drxi4_1U0uQn--mo_dIHfyvZ8KpkosDIjvQ_ci-o4hIF_f4J5zd2DS77fxHh40U3wcqnqutOmWnTJM8XJ-OqvwpdraYxX2eexsclXnj4y1nPflDESshiLPMe9KKfzSNr_3ZSPFv7Qt-FCMBYvZoTA-ILWEezeVyIjPwFkJlJwv5t8tNJtAQJin4f9X7Zl-ch0pDOlM-SgNF4IZhR6_gKemBtR0U; 1P_JAR=2020-03-07-21; arp_scroll_position=427.5; SIDCC=AJi4QfHeZ5xBrG_goWvc0Hw3-dSp0Fc5hMSShlvquJ_0dqPxOY3kL2VRgchD78plA1OdPDrc9kqH')

r = requests.get(url, cookies=cookies)

parser= bs(r.text,'html.parser')
print(parser)

1 个答案:

答案 0 :(得分:1)

您需要使用硒来实现:

import time
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def get_browser():
    chrome_options = Options()
    chrome_options.add_argument("--disable-extensions")
    chrome_options.add_argument('--disable-notifications')
    chrome_options.add_argument('--incognito')
    driver = webdriver.Chrome(options=chrome_options)
    return driver


url = 'https://www.myconstant.com/pro-lending'

driver = get_browser()
driver.get(url)

time.sleep(10)

parser = bs(driver.page_source, "html.parser")
print(parser)
driver.quit()

硒需要chromedriver才能执行。确保此驱动程序与脚本位于同一路径,或在executable_path方法中将参数get_browser指定为:

driver = webdriver.Chrome(executable_path='/path/to/chrome_driver', options=chrome_options)