我有一个循环遍历多个页面的脚本。它在大多数情况下都有效,但即使元素存在,我也会收到TypeError: 'NoneType' object is not subscriptable
链接的错误。我添加了一个if,else语句,允许脚本运行,但它留下了一个空白字段,用于记录或两个应该存在的记录上的链接。这是我的工作脚本,带有if,else语句。关于如何在没有if,else语句的情况下使其工作的任何建议?
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
import json
base_url = "https://www.doabooks.org/"
books = []
n = 5
for i in range(1, n+1):
if (i == 1):
# handle first page
response = urlopen(base_url)
response = urlopen(base_url + "doab?func=browse&page=" + str(i) + "&queryField=A&uiLanguage=en")
page_html = response.read()
response.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs info for each textbook
containers = page_soup.findAll("div",{"class":"data"})
for container in containers:
item = {}
item['type'] = "Open Access Book"
item['title'] = container.span.text.strip()
item['author'] = container.a.text
if container.find('a', {'itemprop' : 'url'}):
item['link'] = "https://www.doabooks.org" + container.find('a', {'itemprop' : 'url'})['href']
else:
item['link'] = ''
item['source'] = "Directory of Open Access Books"
if container.find("a",{"itemprop":"about"}):
item['subject'] = container.find("a",{"itemprop":"about"}).text
else:
item['subject'] = ''
item['base_url'] = "https://www.doabooks.org/"
books.append(item) # add the item to the list
with open("./json/doab-test.json", "w") as writeJSON:
json.dump(books, writeJSON, ensure_ascii=False)
答案 0 :(得分:0)
我认为这可能是一个解析器问题(我不确定)。但是我能够通过网址实现数据
import requests
from bs4 import BeautifulSoup as soup
x=requests.get("https://www.doabooks.org/doab?func=browse&page=2&queryField=A&uiLanguage=en")
print(soup(x.content).find_all("div",{"class":"data"})[5].find_all("a",{"itemprop":"url"}))
我注意到删除" html.parser"因为参数对你的script.i.e完全正常。只是在声明页面汤时不传递第二个参数