Question

首先，我对html知之甚少，我认为这是我的问题。我在搜索中找不到特定的硬币名称时遇到麻烦。我不确定是否使用它具有的td标签查找名称，或者也许有更好的方法。

在选择此备份之前，我会搜索一个特定的部分，但是当即将进行更新时，它将移动名称和价格，因此绝对不理想，但在此期间有效。我回过头来尝试寻找一种方法来查找硬币的名称，而不是查找硬币的放置位置。

def loadPageCM():
     # Grabbing url with requests
     page = requests.get('https://www.coinmarketcap.com')

     # Sending page to Bs4 to parse info
     soup = bs4(page.text, 'html.parser')

     divs = soup.findAll('table', id='currencies')

     content = []
     # finds all div tags and loops through them
     for div in divs:
         rows = div.findAll('tr')
         for row in rows:
         # looping through all the row in the singular div
         # appending to content array and removing the ending portion
         content.append(row.text.replace('\n', '')[:-115])

这是我使用的原始代码。对不起，我很新。

我现在想做的就是根据它们的名称找到这些硬币。从这个标签。

td class =“ no-wrap currency-name” data-sort =“ COIN”

如果有更好的方法，那么我可以提供任何建议。再次道歉，如果问题没有任何意义或对此处或我的代码提出任何改进，我们将不胜感激。谢谢您的宝贵时间。

Answer 1

您在正确的轨道上。由于您知道所需标签的属性，因此请使用标签的attrs从soup.find_all()中获取标签。

TL; DR ：

# Grabbing url with requests
page = requests.get('https://www.coinmarketcap.com')

# Sending page to Bs4 to parse info
soup = BeautifulSoup(page.text, 'html.parser')

tds = soup.find_all('td', attrs={'class': 'no-wrap currency-name'})

for td in tds:
    print(td['data-sort'])   # change to get whichever attributes you want

说明： soup.find_all('td', attrs={'class': 'no-wrap currency-name'})将从页面返回所有100个名称（行）。

然后对于每个td（行），我们访问所需的属性。例如，在第一行<td class="no-wrap currency-name" data-sort="Bitcoin">，td.attrs中显示所有可用属性：{'class': ['no-wrap', 'currency-name'], 'data-sort': 'Bitcoin'}。因此，仅获取硬币的名称属性，请使用td['data-sort']来获取名称Bitcoin。

如果要从行中获取更多信息，例如Market Cap，Price或Volume，请对其他td进行相同的操作：{{1 }}，并使用类似字典的方式访问这些属性。

希望有帮助。

Answer 2

您可以使用attribute = value选择器通过data-sort值（例如， Bitcoin

soup.select_one("[data-sort='Bitcoin']")

并说您想隔离该行，以便获得其所有关联值：使用bs4 4.7.1。您可以使用:has隔离具有上述数据排序的行

row = soup.select_one("tr:has([data-sort='Bitcoin'])")

查看特定硬币值时的最后一部分示例

from bs4 import BeautifulSoup as bs
import requests
import re

r = requests.get('https://coinmarketcap.com/')
soup = bs(r.content, 'lxml')
row = soup.select_one("tr:has([data-sort='Bitcoin'])")
print([re.sub(r'\n+' , ' ' ,item.text.strip()) for item in row.select('td')])

从中使用bs4查找名称

2 个答案: