我用beautifulsoup创建了一个python程序,该程序应该从一个站点找到一个特定的值,但是程序似乎找不到该值。
import bs4
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
value = page_soup.find("td",{"class":"RightBlack"})
print(value)
我试图找到的价值是美元兑换成以色列货币但由于某种原因应该检索该值的代码行:
value = page_soup.find("td",{"class":"RightBlack"})
无法找到它。
答案 0 :(得分:2)
请注意,您要获取的元素位于iframe
内,这意味着这是另一个请求,与您所做的不同,您可以执行代码迭代所有iframes
并打印价格如果找到iframe_soup.find("td",{"class":"RightBlack"})
。
我建议使用except
语句,因为在执行此操作时很容易陷入网址陷阱:
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
page_soup = soup(page_html, "html.parser")
iframesList = page_soup.find_all('iframe')
i = 1
for iframe in iframesList:
print(i, ' out of ', len(iframesList), '...')
try:
uclient = ureq("http://www.calcalist.co.il"+iframe.attrs['src'])
iframe_soup = soup(uclient.read(), "html.parser")
price = iframe_soup.find("td",{"class":"RightBlack"})
if price:
print(price)
break
except:
print("something went wrong")
i+=1
运行代码,输出:
1 out of 8 ...
2 out of 8 ...
3 out of 8 ...
4 out of 8 ...
5 out of 8 ...
<td class="RightBlack">3.5630</td>
所以现在我们有了我们想要的东西:
>>> price
<td class="RightBlack">3.5630</td>
>>> price.text
'3.5630'
Selenium
这是一个建议,要执行请求和JavaScript处理,您应该使用 Selenium
和JS解释器,我正在使用 ChromeDriver
,但您也可以使用 PhantomJS
进行无头浏览。检查框架元素,我们知道它的ID为"StockQuoteIFrame"
,我们使用.switch_to_frame
,然后我们可以轻松找到price
:
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
browser = webdriver.Chrome()
browser.get(url)
browser.switch_to_frame(browser.find_element_by_id("StockQuoteIFrame"))
price = browser.find_element_by_class_name("RightBlack").text
当然,输出与第一个选项相同:
>>> price
'3.5630'