此代码显示表为空,但这不是因为网页填充表带有一些js代码。
所以我不知道该怎么解析。请告诉我如何将其解析。
import bs4 as bs
import urllib.request
source = urllib.request.urlopen('https://monerobenchmarks.info/').read()
soup = bs.BeautifulSoup(source,'lxml')
table = soup.find('table')
table_rows = table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
答案 0 :(得分:1)
BeautifulSoup无法等待Javascript完成,您所要求的是不可能的。 尤其是因为您使用的是urllib来获取页面(并且使用请求也是如此,这仅仅是因为BeautifulSoup缺少执行Javascript代码的引擎)。
您想要的是selenium。
答案 1 :(得分:1)
import requests
r = requests.get("https://monerobenchmarks.info/s/om.php?draw=1&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=true&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=2&columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=3&columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=4&columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=true&columns%5B4%5D%5Borderable%5D=true&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=5&columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=true&columns%5B5%5D%5Borderable%5D=true&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=6&columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=true&columns%5B6%5D%5Borderable%5D=true&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B7%5D%5Bdata%5D=7&columns%5B7%5D%5Bname%5D=&columns%5B7%5D%5Bsearchable%5D=true&columns%5B7%5D%5Borderable%5D=true&columns%5B7%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=false&order%5B0%5D%5Bcolumn%5D=1&order%5B0%5D%5Bdir%5D=desc&start=0&length=10&search%5Bvalue%5D=&search%5Bregex%5D=false&_=1586421151150").json()
for item in r['data']:
item = item[:7]
item[0] = item[0].split("&")[0]
print(item)
输出:
['AMD EPYC 7742', '44000', '225 W', 'XMRig 5.3', 'N/A', 'WINDOWS 10 x64', 'Dec, 2019']
['AMD RYZEN THREADRIPPER 3990X', '43800', '280 W', 'XMRig 5.7.0', 'SOURCE : <a href="https://cryptomining-blog.com/11384-randomx-mining-performance-on-amd-ryzen-threadripper-3990x-processor-64c-128t/" target="_blank">CRYPTO MINING BLOG</a>', 'WINDOWS 10 x64', 'Feb, 2020']
['AMD EPYC 7742', '38732', '225 W', 'RandomX Benchmark Windows x64. SOURCE: <a href="https://redd.it/cqqt12" target="_blank">REDDIT</a>', 'N/A', 'WINDOWS 10 x64', 'Aug, 2019']
['AMD THREADRIPPER 3970X', '28900', '170 W', 'XMRig v5.5', 'SOURCE: <a href="https://redd.it/ei5ra6" target="_blank">REDDIT</a>', 'WINDOWS 10 x64', 'Dec, 2019']
['THREADRIPPER 3970X', '27703', '280 W', 'XMRIG 5.3.0', 'N/A', 'WINDOWS 10 x64', 'Dec, 2019']
['AMD EPYC 7502P', '25300', '200 W', 'XMRig 5.7.0', 'N/A', 'DEBIAN 9 x64', 'Mar, 2020']
['DUAL XEON PLATINUM 8136', '22500', '330 W', 'XMRIG 5.5.0', 'N/A', 'WINDOWS 10 x64', 'Feb, 2020']
['AMD THREADRIPPER 3960X', '20800', '130 W', 'XMRIG 5.5.1', 'cpu: ±3.1ghz, ppt: 130w vcore/soc offset: -0.05v, ram: 3666c15@1.4v, 4x8gb, huge pages, 48 threads, power at wall ~212w (ax1600i, r vii)', 'WINDOWS 10 x64', 'Jan, 2020']
['AMD THREADRIPPER 2990WX', '20057', '379 W', 'XMRig 5.5.3b', '--threads 32 --randomx-1gb-pages', 'UBUNTU 18.04 x64', 'Feb, 2020']
['RYZEN 9 3950X', '19776', '250 W', 'XMRIG 5.5.3', '3950X @ 4.35ghz 1.3312v , 2x16gb 3733cl14', 'WINDOWS 10 x64', 'Feb, 2020']
答案 2 :(得分:1)
要在bs4中使用硒,请尝试chrome或Firefox驱动程序
from bs4 import BeautifulSoup as Soup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://monerobenchmarks.info/")
page = Soup(driver.page_source, features='html.parser')
table = page.find('table')
table_rows = table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)