使用python从表中抓取数据

时间:2019-02-24 03:44:29

标签: python web-scraping beautifulsoup

我需要为页面https://etherscan.io/txs?p=1

中的每笔交易取消价值和TxFee

到目前为止,我得到了所有行,但无法弄清楚如何达到value和TxFee列。你们有什么主意吗?

url = 'https://etherscan.io/txs?p=1'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')
rows=soup.findAll('table')[0].findAll('tr') 

2 个答案:

答案 0 :(得分:0)

您可以找到与每一行关联的所有td值,然后使用拆包:

from bs4 import BeautifulSoup as soup
import requests
d = soup(requests.get('https://etherscan.io/txs?p=1').text, 'html.parser')
_, *_results = [[i.text for i in b.find_all('td')] for b in d.find('div', {'id':'ContentPlaceHolder1_mainrow'}).find_all('tr')]
final_data = [[value, txfee] for *_, value, txfee in _results]

输出:

[['0 Ether', '0.00041913'], ['0 Ether', '0.00013146'], ['0 Ether', '0.00042014'], ['0.25 Ether', '0.00090989'], ['0.00297 Ether', '0.00020952'], ['0.000777479938944 Ether', '0.000105'], ['0 Ether', '0.00031381'], ['0.122179723988007 Ether', '0.0001071'], ['0.1221839785183 Ether', '0.0001071'], ['0.122186397216105 Ether', '0.0001071'], ['0.122201225613423 Ether', '0.0001071'], ['0.122209252917227 Ether', '0.0001071'], ['0.122272308100109 Ether', '0.0001071'], ['0.122284473600606 Ether', '0.0001071'], ['0.122306899219972 Ether', '0.0001071'], ['0.122324534266537 Ether', '0.0001071'], ['0.122336430279177 Ether', '0.0001071'], ['0.122383190389565 Ether', '0.0001071'], ['0.12239364196575 Ether', '0.0001071'], ['0.122437707424275 Ether', '0.0001071'], ['0.122442756345904 Ether', '0.0001071'], ['0.122489465764433 Ether', '0.0001071'], ['0.122492136253689 Ether', '0.0001071'], ['0.122497792117018 Ether', '0.0001071'], ['0.122530953016955 Ether', '0.0001071'], ['0.12256084979765 Ether', '0.0001071'], ['0.122564659930108 Ether', '0.0001071'], ['0 Ether', '0.00052013'], ['0.0009 Ether', '0.0001533'], ['0 Ether', '0.00028854'], ['0 Ether', '0.00037411'], ['0.050174275293877 Ether', '0.00021'], ['0.050079687099593 Ether', '0.00021'], ['0.999929 Ether', '0.00021'], ['0 Ether', '0.00021553'], ['0 Ether', '0.00209731'], ['0.050060845339506 Ether', '0.00039512'], ['0.200193450518523 Ether', '0.00021'], ['0.7751 Ether', '0.00021'], ['0.238 Ether', '0.00021'], ['0.100284577873471 Ether', '0.00021'], ['0.025022910209086 Ether', '0.00021'], ['0.050015901904717 Ether', '0.00021'], ['0.200009974540924 Ether', '0.00021'], ['0.823 Ether', '0.00021'], ['0.100708587548462 Ether', '0.00021'], ['0.200115420355788 Ether', '0.00021'], ['0.100019490621652 Ether', '0.00021'], ['0.200054806640234 Ether', '0.00021'], ['0.20001722285649 Ether', '0.00021']]

答案 1 :(得分:0)

rows=soup.findAll('table')[0].findAll('tr') 
# First tr is the header, skip that
for row in rows[1:]:
  print(row.find_all('td')[7].text) # 8 the column is the fee