嗨, 在我运行以下代码后:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://coinpaprika.com')
soup = BeautifulSoup(page.text, 'html.parser')
coin_list = soup.find('tbody')
coin_list_items = coin_list.find_all('a')
for coin_name in coin_list_items:
names = coin_name.string
links = 'https://coinpaprika.com' + coin_name.get('href')
print(names)
print(links)
程序打印:
None
https://coinpaprika.com/coin/btc-bitcoin/
Bitcoin
https://coinpaprika.com/coin/btc-bitcoin/
None
https://coinpaprika.com/coin/xrp-xrp/
XRP
https://coinpaprika.com/coin/xrp-xrp/
None
https://coinpaprika.com/coin/eth-ethereum/
Ethereum
https://coinpaprika.com/coin/eth-ethereum/
代替:
Bitcoin
https://coinpaprika.com/coin/btc-bitcoin/
XRP
https://coinpaprika.com/coin/xrp-xrp/
Ethereum
https://coinpaprika.com/coin/eth-ethereum/
我了解原因是:
<td class="table__fixed-cell">
<a href="/coin/btc-bitcoin/"><span class="coin-icon currency_images-0"></span></a>
</td>
<td class="table__fixed-cell">
<a href="/coin/btc-bitcoin/">Bitcoin</a>
<small>BTC</small>
</td>
但是我仍然不知道如何只打印第二个。 有人可以帮我吗?
答案 0 :(得分:1)
某些链接的锚文本为空,因为它用于图标图像
<a href="/coin/btc-bitcoin/"><span class="coin-icon currency_images-0"></span></a>
添加支票
for coin_name in coin_list_items:
names = coin_name.string
if not names:
continue
links = 'https://coinpaprika.com' + coin_name.get('href')
print(names)
print(links)
答案 1 :(得分:1)
只需找到包含文本的标签即可。
coin_list_items = coin_list.find_all('a',text=True)