我试图废弃黄页澳大利亚页面。我搜索了澳大利亚的所有Piazza Restaurants。现在我想获取每家餐馆的电子邮件,这是数据电子邮件的价值(锚标签的属性)。下面是我的代码,我在锚标记上使用了getAttribute(),但总是给我这个错误。
TypeError: 'NoneType' object is not callable
这是我的代码
import csv
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
url = "https://www.yellowpages.com.au/search/listings?clue=Pizza+Restaurants&locationClue=Sydney+CBD%2C+NSW&lat=&lon="
driver=webdriver.Chrome(executable_path="/usr/local/share/chromedriver")
driver.get(url)
pageSource=driver.page_source
bsObj=BeautifulSoup(pageSource,'lxml')
items=bsObj.find('div',{'class':'flow-layout outside-gap-large inside-gap inside-gap-large vertical'}).findAll('div',class_='cell in-area-cell find-show-more-trial middle-cell')
for item in items:
print(item.find('a',class_='contact contact-main contact-email ').getAttribute("data-email"))
答案 0 :(得分:0)
Tag.getAttribute
不存在 - 您需要Tag[<attrname>]
(如果您确定该项目具有此属性)或Tag.get(<attrname>[,default=None])
(如果您不是)。
请注意,对于大多数Python对象,您会得到一个AttributeError,但是beautifulsoup会大量使用__getattr__
挂钩并返回None
,而不是在无法动态解析属性时引发AttributeError,而不是混乱。
话虽如此,item.find()
可以返回None
,因此在调用{{{}}之前,您确实还测试item.find()
的结果{1}},即:
.get()
答案 1 :(得分:0)
您也可以尝试这样的事情 https://github.com/n0str/beautifulsoup-none-catcher
所以,它变成了
from maybe import Maybe
bsObj=BeautifulSoup(pageSource,'lxml')
items=Maybe(bsObj).find('div',{'class':'flow-layout outside-gap-large inside-gap inside-gap-large vertical'}).find_all('div', {'class': 'cell in-area-cell find-show-more-trial middle-cell'})
print('\n'.join(filter(lambda x: x, [Maybe(item).find('a', {'class': 'contact-email'}).get("data-email").resolve() for item in items.resolve()])))
输出
[..]@crust.com.au
[..]@madinitalia.com
<...>
[..]@ventuno.com.au
只需打包Maybe(soup)
然后再调用.resolve()