使用python ps4从网页上抓取所有表格

时间:2020-06-05 19:55:35

标签: python web-scraping beautifulsoup datatables

我想使用beautifulsoup来获取此链接https://www.investing.com/indices/indices-futures上的所有表,之后我要获取索引列中的标题以及这些标题的链接。

我只想要第一列中的内容。

例如。

title        href
Dow Jones    /indices/us-30-futures
S&P 500      /indices/us-spx-500-futures
...
Mini DAX     /indices/mini-dax-futures
...
VSTOXX Mini  /indices/vstoxx-mini 


我使用以下代码

url = "https://www.investing.com/indices/indices-futures"
req = requests.get(url, headers=urlheader)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('div', id="cross_rates_container")
for a in table.find_all('a', href=True):
    print (a['title'], a['href'])

我可以看到表变量,但是似乎无法访问标题(包含索引名)和href(包含链接)

这是怎么回事,如何立即获得所有表的条目?

1 个答案:

答案 0 :(得分:1)

您可以遍历<td>个元素,并在它们下面获得<a>链接。

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.investing.com/indices/indices-futures'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

print('{:<30} {}'.format('Title', 'URL'))
for a in soup.select('td.plusIconTd > a'):
    print('{:<30} {}'.format(a.text, 'https://www.investing.com' + a['href']))

打印:

Title                          URL
Dow Jones                      https://www.investing.com/indices/us-30-futures
S&P 500                        https://www.investing.com/indices/us-spx-500-futures
Nasdaq                         https://www.investing.com/indices/nq-100-futures
SmallCap 2000                  https://www.investing.com/indices/smallcap-2000-futures
S&P 500 VIX                    https://www.investing.com/indices/us-spx-vix-futures
DAX                            https://www.investing.com/indices/germany-30-futures
CAC 40                         https://www.investing.com/indices/france-40-futures
FTSE 100                       https://www.investing.com/indices/uk-100-futures
Euro Stoxx 50                  https://www.investing.com/indices/eu-stocks-50-futures
FTSE MIB                       https://www.investing.com/indices/italy-40-futures
SMI                            https://www.investing.com/indices/switzerland-20-futures
IBEX 35                        https://www.investing.com/indices/spain-35-futures
ATX                            https://www.investing.com/indices/austria-20-futures
WIG20                          https://www.investing.com/indices/poland-20-futures
AEX                            https://www.investing.com/indices/netherlands-25-futures
BUX                            https://www.investing.com/indices/hungary-14-futures
RTS                            https://www.investing.com/indices/rts-cash-settled-futures

... and so on.

编辑:带有<td>元素的屏幕截图:

enter image description here