网页抓取:抓取表问题

时间:2021-05-03 09:44:11

标签: python web-scraping python-requests

希望在下面的 url 中抓取主硬币表的全部内容。

但是,我下面的代码似乎不起作用:

import pandas as pd

url = 'https://messari.io/screener/coinbase-ventures-portfolio-34D634C4'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)

我哪里出错了?

2 个答案:

答案 0 :(得分:2)

您可以直接从其呈现的源中获取数据:

import requests
import pandas as pd

url = 'https://data.messari.io/api/v1/markets/prices-legacy'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'}

jsonData = requests.get(url, headers=headers).json()
data = pd.json_normalize(jsonData['data'])

输出:

print(data)
                                        id  ... stakingEngagedPercent
0     1e31218a-e44e-4285-820c-8282ee222035  ...                   NaN
1     21c795f5-1bfd-40c3-858e-e9d7e820c6d0  ...                   NaN
2     7dc551ba-cfed-4437-a027-386044415e3e  ...                   NaN
3     97775be0-2608-4720-b7af-f85b24c7eb2d  ...                   NaN
4     51f8ea5e-f426-4f40-939a-db7e05495374  ...                   NaN
                                   ...  ...                   ...
1609  ff4f6990-5333-4e75-81cb-1342af9cc0a1  ...                   NaN
1610  ffae284d-cb73-44e5-8934-cb3658284e46  ...                   NaN
1611  ffaebc24-053e-428e-a84d-be836e4f8a3a  ...                   NaN
1612  ffc64018-c724-44ac-b3d0-00e33dff7615  ...                   NaN
1613  ffde2011-560a-458b-abaa-2b4f20f851a2  ...                   NaN

[1614 rows x 177 columns]

答案 1 :(得分:0)

页面是动态的,它不包含表格,当你下载它时,你得到的是一些将用于呈现页面的脚本。