我一直在使用BeautifulSoup来从 “ https://www.huaweicloud.com/pricing.html#/ecs”
我想提取该网站的表格信息,但一无所获。
我正在使用Windows 10,最新的BeautifulSoup,Request和Python3.7
import requests
from bs4 import BeautifulSoup
url = 'https://www.huaweicloud.com/pricing.html#/ecs'
headers = {'User-Agent':'Mozilla/5.0'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.content,'html.parser')
soup.find_all('table')
运行soup.find_all('table')
后,它将返回一个空列表:[]
答案 0 :(得分:0)
我知道这不是您问题的答案,但这可能会对您有所帮助。这是我使用selenium和BeautifulSoup想到的代码。您只需指定chromedriver的位置,该脚本就可以使用了。
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.huaweicloud.com/pricing.html#/ecs'
driver = webdriver.Chrome("location of chrome driver")
driver.get(str(url))
driver.find_element_by_id("calculator_tab0").click()
time.sleep(3)
html_source = driver.page_source
soup = BeautifulSoup(html_source, features="lxml")
table_all = soup.findAll("table")
output_rows = []
for table in table_all[:2]:
for table_row in table.findAll('tr'):
thead = table_row.findAll('th')
columns = table_row.findAll('td')
_thead = []
for th in thead:
_thead.append(th.text)
output_rows.append(_thead)
_row = []
for column in columns:
_row.append(column.text)
output_rows.append(_row)
output_rows = [x for x in output_rows if x != []]
df = pd.DataFrame(output_rows)