在抓取时出现属性错误
import urllib2
from bs4 import BeautifulSoup
quote_page ='https://www.bloomberg.com/quote/SPX:IND'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page,'html.parser')
name_box = soup.find('h1', attires ={'class': 'name'})
name = name_box.text.strip()
print name
回溯(最近通话最近一次):
文件“ word1.py”,位于
的第11行name = name_box.text.strip()
AttributeError:'NoneType'对象没有属性'text'
Viveks-MacBook-Pro:py vivek $
答案 0 :(得分:1)
执行此操作
print(name_box)
您将获得
None
Traceback (most recent call last):
File "C:/Users/devsurya/python/demo programs/b4s.py", line 13, in <module>
name = name_box.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'
以及执行此操作时-
print(soup) ## it says following message with weird html and css
我们检测到您计算机网络中的异常活动
和soup.find('h1', attires ={'class': 'name'})
应该是soup.find('h1', {'class': 'companyName__99a4824b'})
答案 1 :(得分:0)
假设您想要公司名称,我将随请求一起去,并且需要几个标头(您将需要进行测试,看它是否随着时间的流逝始终保持一致)。我使用css attribute = value选择器来获取适当的元素,并使用以运算符^开头的情况(如果值是动态的),即我假设常量起始字符串为companyName
。这使得它对于其他请求更具通用性。
import requests
from bs4 import BeautifulSoup as bs
quote_page ='https://www.bloomberg.com/quote/SPX:IND'
page = requests.get(quote_page, headers = {'User-Agent':'Mozilla/5.0', 'accept-language':'en-US,en;q=0.9'})
soup = bs(page.content,'lxml')
name_box = soup.select_one('[class^=companyName]')
name = name_box.text.strip()
print(name)