我看到了文字,但不能.text将它返回SOUP

时间:2018-04-17 02:27:35

标签: python web-scraping beautifulsoup python-requests

跑步:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.yellowpages.com/search? search_terms=bestbuy+10956&geo_location_terms=10956').text

soup = BeautifulSoup(source, 'lxml')

for article in soup.find_all("div", {"class": "result"}):

    info_primary = article.find("div", {"class": "info-section info- 
    primary"}).text

    print(info_primary)

` 当yellowpages具有商店评级时,会产生一些噪音(数字)字符。评级存储在“a”标签中(如果存在),否则没有“a”标签,它直接转到“p”标签。我想从“p”标签中获取文本。

运行:

info_primary = article.find("div", {"class": "info-section info-primary"}).p.text

给予:

AttributeError: 'NoneType' object has no attribute 'text'

运行:

info_primary = article.find("div", {"class": "info-section info-primary"}).p
运行,我可以看到嵌套的文本,但不能返回它。

进一步查看,我想要的商店的电话号码在“p”标签之外。也许通过不同的类描述正确访问“span”标签会有帮助吗?

想法?谢谢!

我是Python的新手,作为预警。

1 个答案:

答案 0 :(得分:1)

两件事:其一,您还需要find <p>标签,以获取其文字。

两,如果没有p标记并且您尝试获取其文本,则会引发AttributeError:您只需忽略它并转到下一个可能有p的标记.find('p') 1}}(您还可以先检查from bs4 import BeautifulSoup import requests source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text soup = BeautifulSoup(source, 'lxml') for article in soup.find_all("div", {"class": "result"}): try: info_primary = article.find("div", {"class": "info-section info-primary"}).find('p').text except AttributeError: continue # If there's no <p> (raises AttributeError) just continue to next loop iteration print(info_primary) 是否为非;相同的效果)

p

您可以看到p标记而不是其文字的原因是该文字不在span标记内,而是在 try: info_primary = article.find("div", {"class": "info-section info-primary"}).p.span.text except AttributeError: continue # If there's no <p> (raises AttributeError) just continue to next loop iteration 标记内。

你可以做到

span

但这只会产生第一个span的文本。相反,要获取所有from bs4 import BeautifulSoup import requests source = requests.get('https://www.yellowpages.com/search?search_terms=bestbuy+10956&geo_location_terms=10956').text soup = BeautifulSoup(source, 'lxml') for article in soup.find_all("div", {"class": "result"}): try: span_data = article.find("div", {"class": "info-section info-primary"}).p.find_all('span') info_primary = '' for span in span_data: info_primary += ' ' + span.text except AttributeError: continue # If there's no <p> (raises AttributeError) just continue to next loop iteration print(info_primary) 的文本,您也可以这样做:

class MainActivity extends Component {


  static navigationOptions = {
    title: 'Welcome',
  };