跑步:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.yellowpages.com/search? search_terms=bestbuy+10956&geo_location_terms=10956').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all("div", {"class": "result"}):
info_primary = article.find("div", {"class": "info-section info-
primary"}).text
print(info_primary)
` 当yellowpages具有商店评级时,会产生一些噪音(数字)字符。评级存储在“a”标签中(如果存在),否则没有“a”标签,它直接转到“p”标签。我想从“p”标签中获取文本。
运行:
info_primary = article.find("div", {"class": "info-section info-primary"}).p.text
给予:
AttributeError: 'NoneType' object has no attribute 'text'
运行:
info_primary = article.find("div", {"class": "info-section info-primary"}).p
运行,我可以看到嵌套的文本,但不能返回它。
进一步查看,我想要的商店的电话号码在“p”标签之外。也许通过不同的类描述正确访问“span”标签会有帮助吗?
想法?谢谢!
我是Python的新手,作为预警。
答案 0 :(得分:1)
两件事:其一,您还需要find
<p>
标签,以获取其文字。
两,如果没有p
标记并且您尝试获取其文本,则会引发AttributeError
:您只需忽略它并转到下一个可能有p
的标记.find('p')
1}}(您还可以先检查from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all("div", {"class": "result"}):
try:
info_primary = article.find("div", {"class": "info-section info-primary"}).find('p').text
except AttributeError:
continue # If there's no <p> (raises AttributeError) just continue to next loop iteration
print(info_primary)
是否为非;相同的效果)
p
您可以看到p
标记而不是其文字的原因是该文字不在span
标记内,而是在 try:
info_primary = article.find("div", {"class": "info-section info-primary"}).p.span.text
except AttributeError:
continue # If there's no <p> (raises AttributeError) just continue to next loop iteration
标记内。
你可以做到
span
但这只会产生第一个span
的文本。相反,要获取所有from bs4 import BeautifulSoup
import requests
source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all("div", {"class": "result"}):
try:
span_data = article.find("div", {"class": "info-section info-primary"}).p.find_all('span')
info_primary = ''
for span in span_data:
info_primary += ' ' + span.text
except AttributeError:
continue # If there's no <p> (raises AttributeError) just continue to next loop iteration
print(info_primary)
的文本,您也可以这样做:
class MainActivity extends Component {
static navigationOptions = {
title: 'Welcome',
};