我想使用beautifulsoup来获取此链接https://www.investing.com/indices/indices-futures
上的所有表,之后我要获取索引列中的标题以及这些标题的链接。
我只想要第一列中的内容。
例如。
title href
Dow Jones /indices/us-30-futures
S&P 500 /indices/us-spx-500-futures
...
Mini DAX /indices/mini-dax-futures
...
VSTOXX Mini /indices/vstoxx-mini
我使用以下代码
url = "https://www.investing.com/indices/indices-futures"
req = requests.get(url, headers=urlheader)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('div', id="cross_rates_container")
for a in table.find_all('a', href=True):
print (a['title'], a['href'])
我可以看到表变量,但是似乎无法访问标题(包含索引名)和href(包含链接)
这是怎么回事,如何立即获得所有表的条目?
答案 0 :(得分:1)
您可以遍历<td>
个元素,并在它们下面获得<a>
链接。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.investing.com/indices/indices-futures'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print('{:<30} {}'.format('Title', 'URL'))
for a in soup.select('td.plusIconTd > a'):
print('{:<30} {}'.format(a.text, 'https://www.investing.com' + a['href']))
打印:
Title URL
Dow Jones https://www.investing.com/indices/us-30-futures
S&P 500 https://www.investing.com/indices/us-spx-500-futures
Nasdaq https://www.investing.com/indices/nq-100-futures
SmallCap 2000 https://www.investing.com/indices/smallcap-2000-futures
S&P 500 VIX https://www.investing.com/indices/us-spx-vix-futures
DAX https://www.investing.com/indices/germany-30-futures
CAC 40 https://www.investing.com/indices/france-40-futures
FTSE 100 https://www.investing.com/indices/uk-100-futures
Euro Stoxx 50 https://www.investing.com/indices/eu-stocks-50-futures
FTSE MIB https://www.investing.com/indices/italy-40-futures
SMI https://www.investing.com/indices/switzerland-20-futures
IBEX 35 https://www.investing.com/indices/spain-35-futures
ATX https://www.investing.com/indices/austria-20-futures
WIG20 https://www.investing.com/indices/poland-20-futures
AEX https://www.investing.com/indices/netherlands-25-futures
BUX https://www.investing.com/indices/hungary-14-futures
RTS https://www.investing.com/indices/rts-cash-settled-futures
... and so on.
编辑:带有<td>
元素的屏幕截图: