Question

我正在尝试使用python和python的新用户学习抓取，只需遵循在线提供的教程。打印命令不起作用，如视频中所示。以下是整个代码。

import requests
from bs4 import BeautifulSoup

url = "http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA"

r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data = soup.find_all("div", {"class": "info"})
for item in g_data:
    print (item.text)

for item in g_data:
    print (item.contents[0].text)
    print (item.contents[1].text)

#Print text elements (**The command below does not work!!!!**)
for item in g_data:
    print (item.contents.find_all("a", {"class": "business-name"}).text)

Answer 1

使用BeautifulSoup解析嵌套html需要一些练习，但是一旦你理解它是如何工作的，它就会非常整洁。

有许多小缺陷阻止您的代码工作。我并没有假装把所有这些考虑在内，但我们可以从一步一步的例子开始，希望能让你更好地理解。

例如，你不能这样做：

item.contents.find_all("a")

因为item.contents不是BeautifulSoup对象。它是BeautifulSoup在list中找到的基本Python item。要继续在item中进行搜索，您必须使用find_all查询对象本身。所以，你可以这样做：

for item in g_data:
    print(item.find_all("a", {"class": "business-name"}).text)

但它仍然不正确。由于两件事：

find_all的结果是list的{{1}}，其中没有objects方法
无论如何，BeautifulSoup对象没有text方法。但他们有一个text方法

此contents方法返回在标记内找到的字符串列表。所以，你必须做类似的事情：

contents

如果其余的都是正确的（我不确定），上面的代码会给你一些类似的东西：

for item in g_data:
    links = item.find_all("a", {"class": "business-name"})
    links_contents = [ link.contents[0] for link in links ]
    print("\n".join(links_contents))

Answer 2

import requests
from bs4 import BeautifulSoup

url = "http://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA"

r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data = soup.find_all("div", {"class": "info"})
for item in g_data:
    print (item.text)

for item in g_data:

print(item.contents[0].find_all("a", {"class": "business-name"})[0].text)
print(item.contents[1].find_all("span", {"itemprop": "StreetAddress"})[1].text)

Python：打印文本元素不起作用

2 个答案: