Question

我一直在使用BeautifulSoup来从 “ https://www.huaweicloud.com/pricing.html#/ecs”

我想提取该网站的表格信息，但一无所获。

我正在使用Windows 10，最新的BeautifulSoup，Request和Python3.7

import requests
from bs4 import BeautifulSoup
url = 'https://www.huaweicloud.com/pricing.html#/ecs'
headers = {'User-Agent':'Mozilla/5.0'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.content,'html.parser')
soup.find_all('table')

运行soup.find_all('table')后，它将返回一个空列表：[]

Answer 1

我知道这不是您问题的答案，但这可能会对您有所帮助。这是我使用selenium和BeautifulSoup想到的代码。您只需指定chromedriver的位置，该脚本就可以使用了。

from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.huaweicloud.com/pricing.html#/ecs'

driver = webdriver.Chrome("location of chrome driver")
driver.get(str(url))
driver.find_element_by_id("calculator_tab0").click()
time.sleep(3)
html_source = driver.page_source
soup = BeautifulSoup(html_source, features="lxml")
table_all = soup.findAll("table")

output_rows = []
for table in table_all[:2]:
    for table_row in table.findAll('tr'):
        thead = table_row.findAll('th') 
        columns = table_row.findAll('td')
        _thead = []
        for th in thead:
            _thead.append(th.text)
        output_rows.append(_thead)
        _row = []
        for column in columns:
            _row.append(column.text)
        output_rows.append(_row)

output_rows = [x for x in output_rows if x != []]

df = pd.DataFrame(output_rows)

无法使用BeautifulSoup查找特定表

1 个答案: