Question

好，让我们再试一次。我正在抓取xml格式的网页。我正在收集所需的东西，但是对于其中一项来说，它无法提取文本（在下面的代码中称为“ item”）。我收到以下错误：“ item = items.find（” image：title“）。text AttributeError：'NoneType'对象没有属性'text'”我只想获取'item'的文本。

这是我的代码：

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)  AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}

url = 'https://www.kith.com/sitemap_products_1.xml'

r = requests.get(url=url, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

for items in soup.find_all("url"):
    item = items.find("image:title").text
    url = items.find("loc").text
    if item is not None:
        print(item, url)

Answer 1

基本上在这一行：

item = items.find("image:title").text

items.find("image:title")返回None（可能是因为find在items中找不到您想要的内容）。因此，由于None没有属性text，因此(None).text引发错误AttributeError: 'NoneType' object has no attribute 'text'

如果要解决该错误，可以执行以下操作：

item = items.find("image:title")
if item:
    title = item.text     # you can use other variable name if you want to.
else:
    print("there is no image:title in items")

Answer 2

您的第一个文本返回None，因此您会收到此错误。您需要先检查项目是否为空，然后才能获取文本。

for items in soup.find_all("url"):
getTitle = items.find('image:title')
if getTitle is not None:
    item = getTitle.text
    url = items.find("loc").text
    print (item,url)

对象没有属性“文本”

2 个答案: