Question

当我抓取时，一切都会顺利进行，但是有时我会抓取很多信息我明白了

AttributeError：“ NoneType”对象没有属性“ h1” 。下面是我的代码：

    for index, link in enumerate(all_links):
        self.driver.execute_script("window.open('" + link + "');")
        print(link)
        sleep(9)
        self.driver.switch_to.window(self.driver.window_handles[1])
        final_soup = BeautifulSoup(self.driver.page_source, 'lxml')
        image = final_soup.find('div', attrs={'class': 'someClass_1'})
        filename = 'image_' + str(index) + '.png'
        title = final_soup.find('div', attrs={'class': 'someClass_2'})
        sleep(1)
        origin_title = title.h1.getText()   # here is the problem
        print(origin_title)

有时会出现此错误，很奇怪的是，我检查了html中的特定链接，并且所有外观看上去都与其他链接相同，我不知道为什么会收到错误。h1标签中的文本在那里。

我试图增加睡眠，但没有任何改变。另一件事，我可以做的就是增加尝试-抓住：

 try:
    origin_title = title.h1.getText()   # here is the problem
    print(origin_title)
 except AttributeError:
    pass

但是我的问题是，如果在h1标签中找不到文本，我不想通过，文本在那里，我应该以某种方式得到它

Answer 1

请用此行更正您的代码行

final_soup.find('div', attrs={'class': 'someClass_2)

与此行

final_soup.find('div', {'class': 'someClass_2'})

您错过了“'}”

Answer 2

此代码完成了我需要的工作。看来该错误可能是由于Internet意外断开连接或服务器未响应。

 while True:
            try:
                title = final_soup.find('div', attrs={'class': 'someclass'})
                sleep(1)
                origin_title = title.h1.getText()
                print(origin_title)
            except Exception as ex:
                print('number of try', i)
                sleep(1)
                i += 1
                continue
            break

getText方法中的AttributeError

2 个答案: