使用python进行网络抓取-未下载动态表格数据

时间:2020-08-14 18:49:51

标签: html python-3.x beautifulsoup urllib

我想从网站https://explorer.flitsnode.app/address/FieXP1irJKvmWUiqV18AFdDZD8bgWvfRiC/获取交易时间 但是当我请求html 时,我没有完整的网站数据

我得到了除所需表内容以外的所有内容-“地址交易” enter image description here

我有#txaddr表的css选择器,但它只返回顶部(时间戳,块,哈希,..)

到目前为止,我的代码-我在其中添加了一些注释。

import bs4
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

def NodeRewardTime(link):
   req = Request(link,headers={'User-Agent': 'Mozilla/5.0'})
   webpage = urlopen(req).read()
   soup = bs4.BeautifulSoup(webpage, 'html5lib')  # pip install html5lib
   all_results = soup.select("#txaddr") # CSS selector for the entire table
   try:
       [print(x.text) for x in all_results] # prints results 
   except:
       print("No data to show")

link = "https://explorer.flitsnode.app/address/FieXP1irJKvmWUiqV18AFdDZD8bgWvfRiC/"

NodeRewardTime(link)
input("End")

输出:TimestampBlockHashAmount(FLS)余额(FLS)TX类型[End]

enter image description here

2 个答案:

答案 0 :(得分:1)

如果我们检查页面,您会看到数据是通过this网站以JSON格式加载的。

以下内容将以表格格式打印数据:

SELECT DISTINCT 
       decile_clean, 
       COUNT(*) OVER (PARTITION BY decile_clean) AS count
FROM uld_data_combined_rasterized
GROUP BY id, decile_clean
HAVING COUNT(DISTINCT source) = 3

输出:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import json


def NodeRewardTime(link):
    req = Request(link, headers={"User-Agent": "Mozilla/5.0"})
    webpage = urlopen(req).read()

    soup = BeautifulSoup(webpage, "html5lib")
    json_data = json.loads(soup.text)

    return "\n".join(" | ".join(i) for i in json_data["data"])

URL = "https://explorer.flitsnode.app/get_address_transactions?address=fiexp1irjkvmwuiqv18afddzd8bgwvfric"
print(NodeRewardTime(URL))

答案 1 :(得分:-1)

您必须占用整行并用循环将其清除,以便在输出中仅显示需要的内容。