AttributeError:'str'对象没有属性'get'或最多返回None

时间:2020-01-23 15:25:19

标签: python python-3.x beautifulsoup

我正在寻找其中一本书的网址:https://www.bookdepository.com/search?searchTerm=painted+house+grisham&search=Find+book

我正在尝试根据我为另一个网站编写的脚本改编以下代码,但这会导致标题错误。

我不知道代码的哪一部分将在下面进行修改。 此脚本充其量只能返回None,这告诉我soup变得毫无味道。感谢您的帮助。

def get_detail_data(soup):
"""Get info from each product page."""

    # title
    if extension == 'com':
        if site == 'bookdepository':
            try:
                title = soup.select_one('h1[itemprop="name"]')
                # for div in title.select('div'):
                #     div.extract()
                # title = title.get_text(strip=True).replace(';', ' ')
            except:
                title = ''
# ...code continues

def get_index_data(soup):
"""Get product link from index page (not pagination link)."""

    if extension == 'com':
        try:
            # links = soup.find_all('a', class_='s-item__link')
            # links = soup.find_all('h3', class_='title')
            # links = soup.find_all('a', href=True)[0]['href']
            links = soup.find("a").get("href")
            # links = soup.find_all('a', class_='s-item__link')
            # print(links)
            # links = soup.select('.title a')
            # for a in links:
            #     links = links.get_text(strip=True).replace(';', ' ')
        except:
            links = []

    elif #...code continues

    res_url = [item.get('href') for item in links]

    return res_url

====更新

get_index_data(soup)中,我用links = soup.find("a").get("href")代替了links = soup.find_all('div', {'class': 'item-info'}).find_all("a", href=True)

现在,当我将鼠标悬停在links中的res_url上时,它会告诉我:Local variable 'links' might be referenced before assignment

我不知道从那里去哪里。

====更新

经过一些清理之后,我现在回到相同的错误:AttributeError: 'str' object has no attribute 'get'处于分配级别res_urllinks关键字突出显示。

====更新

我使用字符串将URL定义为url = ''。但是现在我进入了urls = {'url1': 'blah', 'url2': 'blah'}

中的网址字典

所以现在的问题是如何在不使用urls = [item.get('href') for item in links]的情况下转换.get来检索用户选择的网址。

====更新

def get_index_data(soup)中,我想到了这一行: links = [k for k, v in urls.items() if v == urls[site]]

urls中的

urls.items()被突出显示。

但是我的urls字典位于脚本底部的main()函数中。我把它放在脚本的顶部。没有什么改变。 因此,我无法使用上面的links列表理解来从urls字典中检索一个url。

2 个答案:

答案 0 :(得分:0)

当您的try失败时,它将给您links = []。然后,您尝试遍历一个空列表,因此得到None。您还可能需要使用find_all(),因为find()仅返回找到的第一个元素(在这种情况下为a,并且如果a标签没有href,您一无所获。

您将需要做一些过滤,因为页面源中包含262个href。我可以帮助您从此开始,但是您需要提供更多详细信息/信息才能获得更多帮助:

看看下面的代码,以帮助您确定所需的内容:

import requests
from bs4 import BeautifulSoup

url = 'https://www.bookdepository.com/search?searchTerm=painted+house+grisham&search=Find+book'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all("a", href=True)
for each in links:
    print(each.get('href'))

答案 1 :(得分:0)

如果您有漂亮的汤4.7.1或更高版本,则可以使用以下CSS选择器来获取所有产品链接。

import requests
from bs4 import BeautifulSoup
url='https://www.bookdepository.com/search?searchTerm=painted+house+grisham&search=Find+book'
res=requests.get(url).text
soup=BeautifulSoup(res,'html.parser')
links=[item['href'] for item in soup.select("div.item-info >h3 >a[href]")]
print(links)
print(len(links))

输出

['/Painted-House-John-Grisham/9780440237228?ref=grid-view&qid=1579799743707&sr=1-1', '/Painted-House-John-Grisham/9780099537021?ref=grid-view&qid=1579799743707&sr=1-2', '/Painted-House-John-Grisham/9780385337939?ref=grid-view&qid=1579799743707&sr=1-3', '/Painted-House-John-Grisham/9780345532046?ref=grid-view&qid=1579799743707&sr=1-4', '/Painted-House-John-Grisham/9780385501200?ref=grid-view&qid=1579799743707&sr=1-5', '/La-granja-Painted-House-John-Grisham/9788499080826?ref=grid-view&qid=1579799743707&sr=1-6', '/Painted-House-John-Grisham/9781613834909?ref=grid-view&qid=1579799743707&sr=1-7', '/Painted-House-John-Grisham/9780099416159?ref=grid-view&qid=1579799743707&sr=1-8', '/X100-Corks-Painted-House-Pallet-John-Grisham/9780099440895?ref=grid-view&qid=1579799743707&sr=1-9', '/Painted-House-Header-W-H-Smith-Only-John-Grisham/9780099442790?ref=grid-view&qid=1579799743707&sr=1-10', '/Painted-House-John-Grisham/9780553527728?ref=grid-view&qid=1579799743707&sr=1-11', '/Painted-House-John-Grisham/9780385501217?ref=grid-view&qid=1579799743707&sr=1-12', '/Painted-House-27c-Hc-Aud-LP-Mix-Flr-John-Grisham/9780385502399?ref=grid-view&qid=1579799743707&sr=1-13', '/Painted-House-12-Copy-Slimline-Floor-Display-John-Grisham/9780385501910?ref=grid-view&qid=1579799743707&sr=1-14', '/18c-MM-Solo-Flr-Display-Painted-House-John-Grisham/9780440803812?ref=grid-view&qid=1579799743707&sr=1-15', '/X18-Painted-House-Dumpbin-John-Grisham/9780712689618?ref=grid-view&qid=1579799743707&sr=1-16', '/RC-527-Painted-House-X6-Counterp-John-Grisham/9781856865081?ref=grid-view&qid=1579799743707&sr=1-17', '/Painted-House-John-Grisham/9781439568279?ref=grid-view&qid=1579799743707&sr=1-18', '/La-Casa-Dipinta-Painted-House-John-Grisham/9788804505518?ref=grid-view&qid=1579799743707&sr=1-19', '/Painted-House-John-Grisham/9780099586098?ref=grid-view&qid=1579799743707&sr=1-20', '/Painted-House-Complete-Unabridged-John-Grisham/9780754054634?ref=grid-view&qid=1579799743707&sr=1-21', '/Die-Farm-5-Audio-CDs-Painted-House-5-Audio-CDs-dtsch-Version-John-Grisham/9783898308144?ref=grid-view&qid=1579799743707&sr=1-22', '/Brethren-John-Grisham/9780091896492?ref=grid-view&qid=1579799743707&sr=1-23', '/Painted-House-John-Grisham/9780553712742?ref=grid-view&qid=1579799743707&sr=1-24', '/Painted-House-John-Grisham/9780440295983?ref=grid-view&qid=1579799743707&sr=1-25', '/18c-Solo-Painted-House-TV-Tie-Floor-Display-with-Riser-John-Grisham/9780440805311?ref=grid-view&qid=1579799743707&sr=1-26', '/19-Copy-John-Grisham-Prepack-Incl-2-Tr-EA-Brethren-Chamber-King-Torts-Painted-House-Partner-Street-Lawyer-Rainmaker-John-Grisham/9780385395939?ref=grid-view&qid=1579799743707&sr=1-27', '/X18-Painted-House-Dumpbin-Export-John-Grisham/9780712689700?ref=grid-view&qid=1579799743707&sr=1-28', '/Painted-House-Complete-Unabridged-John-Grisham/9780754007272?ref=grid-view&qid=1579799743707&sr=1-29', '/Painted-House-John-Grisham/9780736689434?ref=grid-view&qid=1579799743707&sr=1-30']
30

或者如果您想将find_all()用于标头类,然后搜索find_next('a')

import requests
from bs4 import BeautifulSoup
url='https://www.bookdepository.com/search?searchTerm=painted+house+grisham&search=Find+book'
res=requests.get(url).text
soup=BeautifulSoup(res,'html.parser')
linksall=[item.find_next('a',href=True)['href'] for item in soup.find_all("div", class_="item-info")]
print(linksall)
print(len(linksall))

输出

['/Painted-House-John-Grisham/9780440237228?ref=grid-view&qid=1579799743707&sr=1-1', '/Painted-House-John-Grisham/9780099537021?ref=grid-view&qid=1579799743707&sr=1-2', '/Painted-House-John-Grisham/9780385337939?ref=grid-view&qid=1579799743707&sr=1-3', '/Painted-House-John-Grisham/9780345532046?ref=grid-view&qid=1579799743707&sr=1-4', '/Painted-House-John-Grisham/9780385501200?ref=grid-view&qid=1579799743707&sr=1-5', '/La-granja-Painted-House-John-Grisham/9788499080826?ref=grid-view&qid=1579799743707&sr=1-6', '/Painted-House-John-Grisham/9781613834909?ref=grid-view&qid=1579799743707&sr=1-7', '/Painted-House-John-Grisham/9780099416159?ref=grid-view&qid=1579799743707&sr=1-8', '/X100-Corks-Painted-House-Pallet-John-Grisham/9780099440895?ref=grid-view&qid=1579799743707&sr=1-9', '/Painted-House-Header-W-H-Smith-Only-John-Grisham/9780099442790?ref=grid-view&qid=1579799743707&sr=1-10', '/Painted-House-John-Grisham/9780553527728?ref=grid-view&qid=1579799743707&sr=1-11', '/Painted-House-John-Grisham/9780385501217?ref=grid-view&qid=1579799743707&sr=1-12', '/Painted-House-27c-Hc-Aud-LP-Mix-Flr-John-Grisham/9780385502399?ref=grid-view&qid=1579799743707&sr=1-13', '/Painted-House-12-Copy-Slimline-Floor-Display-John-Grisham/9780385501910?ref=grid-view&qid=1579799743707&sr=1-14', '/18c-MM-Solo-Flr-Display-Painted-House-John-Grisham/9780440803812?ref=grid-view&qid=1579799743707&sr=1-15', '/X18-Painted-House-Dumpbin-John-Grisham/9780712689618?ref=grid-view&qid=1579799743707&sr=1-16', '/RC-527-Painted-House-X6-Counterp-John-Grisham/9781856865081?ref=grid-view&qid=1579799743707&sr=1-17', '/Painted-House-John-Grisham/9781439568279?ref=grid-view&qid=1579799743707&sr=1-18', '/La-Casa-Dipinta-Painted-House-John-Grisham/9788804505518?ref=grid-view&qid=1579799743707&sr=1-19', '/Painted-House-John-Grisham/9780099586098?ref=grid-view&qid=1579799743707&sr=1-20', '/Painted-House-Complete-Unabridged-John-Grisham/9780754054634?ref=grid-view&qid=1579799743707&sr=1-21', '/Die-Farm-5-Audio-CDs-Painted-House-5-Audio-CDs-dtsch-Version-John-Grisham/9783898308144?ref=grid-view&qid=1579799743707&sr=1-22', '/Brethren-John-Grisham/9780091896492?ref=grid-view&qid=1579799743707&sr=1-23', '/Painted-House-John-Grisham/9780553712742?ref=grid-view&qid=1579799743707&sr=1-24', '/Painted-House-John-Grisham/9780440295983?ref=grid-view&qid=1579799743707&sr=1-25', '/18c-Solo-Painted-House-TV-Tie-Floor-Display-with-Riser-John-Grisham/9780440805311?ref=grid-view&qid=1579799743707&sr=1-26', '/19-Copy-John-Grisham-Prepack-Incl-2-Tr-EA-Brethren-Chamber-King-Torts-Painted-House-Partner-Street-Lawyer-Rainmaker-John-Grisham/9780385395939?ref=grid-view&qid=1579799743707&sr=1-27', '/X18-Painted-House-Dumpbin-Export-John-Grisham/9780712689700?ref=grid-view&qid=1579799743707&sr=1-28', '/Painted-House-Complete-Unabridged-John-Grisham/9780754007272?ref=grid-view&qid=1579799743707&sr=1-29', '/Painted-House-John-Grisham/9780736689434?ref=grid-view&qid=1579799743707&sr=1-30']
30

希望这就是你的追求。