我想获取属于<p>
的每个news1
标记内的所有文字
import requests
from bs4 import BeautifulSoup
r1 = requests.get("http://www.metalinjection.net/shocking-revelations/machine-heads-robb-flynn-addresses-controversial-photo-from-his-past-in-the-wake-of-charlottesville")
data1 = r1.text
soup1 = BeautifulSoup(data1, "lxml")
news1 = soup1.find_all("div", {"class": "article-detail"})
for x in news1:
print x.find("p").text
这会得到第一个<p>
文本而且只有..当调用find_all时它会出现以下错误
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
所以我做了一个list.but仍然得到同样的错误??
text1 = []
for x in news1:
text1.append(x.find_all("p").text)
print text1
答案 0 :(得分:1)
运行代码时出现的错误是:AttributeError: 'ResultSet' object has no attribute 'text'
,这是合理的,因为bs4 ResultSet
基本上是Tag
元素的列表。你可以得到每个&#39; p&#39;标记,如果你循环遍历该iterable。
text1 = []
for x in news1:
for i in x.find_all("p"):
text1.append(i.text)
或者作为单行使用列表推导:
text1 = [i.text for x in news1 for i in x.find_all("p")]