我正在尝试抓取有关Udemy课程的网页。视频334 The Modern Python 3 Bootcamp
我正在看一个带有引号的页面,每个引号都有一个作者,ahref和引号。我需要将所有这些都放在列表中。
.select_all仅返回任何内容。如果使用.select可以正常工作,但是后来我无法“找到”我需要的东西,因为错误:AttributeError:'list'对象没有属性'find'(为什么->:*(> __>)< / p>
请在下面查看我的代码,并查看有效和无效之间的注释:
url = "http://quotes.toscrape.com"
url_next = "/page/1"
ori_url = requests.get(f"{url}{url_next}").text
every_thang = []
soup = BeautifulSoup(ori_url, "html.parser")
#all_the_quotes = soup.select(".quote") # this actually works, but cant use .find on it later
all_the_quotes2 = soup.find_all(".quote")
for q in all_the_quotes2:
every_thang.append({
"text": all_the_quotes2.find(".text").get_text(),
"author": all_the_quotes2.find(".author").get_text(),
"linky": all_the_quotes2.find("a")["href"]
})
#for q in all_the_quotes: # gives error trying to use find
# every_thang.append({
# "text": all_the_quotes.find(".text").get_text(),
# "author": all_the_quotes.find(".author").get_text(),
# "linky": all_the_quotes.find("a")["href"]
# })
print(all_the_quotes2)
答案 0 :(得分:2)
使用findAll的正确方法是:
all_the_quotes2 = soup.find_all("div", {"class": "quote"})
答案 1 :(得分:1)
.select()
和.find_all()
的界面不同。 select()
接受CSS选择器(list of all CSS selectors that BeautifulSoup 4.7.1+ supports),而不接受find_all()
(list of bs4 filters)。
要选择所有类别为"quote"
的标签,您可以执行soup.find_all(class_="quote")
:
import requests
from bs4 import BeautifulSoup
url = "http://quotes.toscrape.com"
url_next = "/page/1"
ori_url = requests.get(f"{url}{url_next}").text
every_thang = []
soup = BeautifulSoup(ori_url, "html.parser")
all_the_quotes2 = soup.find_all(class_="quote")
every_thang = []
for q in all_the_quotes2:
every_thang.append({
"text": q.find(class_="text").get_text(),
"author": q.find(class_="author").get_text(),
"linky": q.find("a")["href"]
})
from pprint import pprint
pprint(every_thang)
打印:
[{'author': 'Albert Einstein',
'linky': '/author/Albert-Einstein',
'text': '“The world as we have created it is a process of our thinking. It '
'cannot be changed without changing our thinking.”'},
{'author': 'J.K. Rowling',
'linky': '/author/J-K-Rowling',
'text': '“It is our choices, Harry, that show what we truly are, far more '
'than our abilities.”'},
{'author': 'Albert Einstein',
'linky': '/author/Albert-Einstein',
'text': '“There are only two ways to live your life. One is as though '
'nothing is a miracle. The other is as though everything is a '
'miracle.”'},
{'author': 'Jane Austen',
'linky': '/author/Jane-Austen',
'text': '“The person, be it gentleman or lady, who has not pleasure in a '
'good novel, must be intolerably stupid.”'},
{'author': 'Marilyn Monroe',
'linky': '/author/Marilyn-Monroe',
'text': "“Imperfection is beauty, madness is genius and it's better to be "
'absolutely ridiculous than absolutely boring.”'},
{'author': 'Albert Einstein',
'linky': '/author/Albert-Einstein',
'text': '“Try not to become a man of success. Rather become a man of '
'value.”'},
{'author': 'André Gide',
'linky': '/author/Andre-Gide',
'text': '“It is better to be hated for what you are than to be loved for '
'what you are not.”'},
{'author': 'Thomas A. Edison',
'linky': '/author/Thomas-A-Edison',
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"},
{'author': 'Eleanor Roosevelt',
'linky': '/author/Eleanor-Roosevelt',
'text': '“A woman is like a tea bag; you never know how strong it is until '
"it's in hot water.”"},
{'author': 'Steve Martin',
'linky': '/author/Steve-Martin',
'text': '“A day without sunshine is like, you know, night.”'}]