我正在尝试抓一个在同一页面的不同部分有多个表格的网站。
import requests
from bs4 import BeautifulSoup
url = "https://www.predictit.org/Contract/5367/Will-Donald-Trump-be-president-at-year-end-2018#prices"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html.parser")
table_body = soup.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols=row.find_all('td')
cols=[x.text.strip() for x in cols]
print(cols)
此部分有多个页面,每个部分都有不同的表格。我试图抓取“#prices”部分中的价格数据,但我已在URL中指定它,但BeautifulSoup默认为第一部分“#data”中的表。有什么方法可以导航到我想要的部分吗?
答案 0 :(得分:1)
在这种情况下,您需要向以下网址发送请求以获取您要解析的价格。您可以使用devtools获取该URL ..
import requests
from bs4 import BeautifulSoup
url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=5367"
res = requests.get(url)
soup = BeautifulSoup(res.text,"html.parser")
for row in soup.select('table tr')[1:]:
cols = [x.text.strip() for x in row.select('td')]
print(cols)
输出:
['Price', 'Shares', '', 'Price', 'Shares']
['81¢', '289', '', '80¢', '2192']
['82¢', '7936', '', '79¢', '5478']
['83¢', '12800', '', '78¢', '6189']
['84¢', '8846', '', '77¢', '6167']
['85¢', '7726', '', '76¢', '2334']
['86¢', '7247', '', '75¢', '3268']
['87¢', '5562', '', '74¢', '2425']
['88¢', '4988', '', '73¢', '1390']
['89¢', '2889', '', '72¢', '3836']
['90¢', '4143', '', '71¢', '944']