我一直在尝试构建网络抓取工具,以帮助我掌握行业内发表的文章。
我精打细算,因为当我尝试通过Flask运行代码时,我会不断收到此错误:
TypeError:视图函数未返回有效响应。该函数返回None或不返回return语句结束。
以下是产生错误的代码:
文档1是blogscraper.py,其内容为:
import requests
from bs4 import BeautifulSoup
def blog_parser(url) -> 'html':
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(url, headers=headers)
return result.content
def html(url) -> 'html':
website = blog_parser(url)
html = BeautifulSoup(website, 'html.parser')
return html
def site_articles(url, element, unique_element) -> 'html':
sitehtml = html(url)
article_data = sitehtml.find_all(element, unique_element)
return article_data
def skillsoft_titles(list_item):
skillsoft_articles = site_articles('https://www.skillsoft.com/blog', "h1", {"class": "entry-title"})
entries = skillsoft_articles[list_item]
title = entries.find('a').get_text()
return title
def skillsoft_link(list_item):
skillsoft_articles = site_articles('https://www.skillsoft.com/blog', "h1", {"class": "entry-title"})
entries = skillsoft_articles[list_item]
link = entries.find('a').get('href')
return link
def skillsoft_description(list_item):
skillsoft_articles = site_articles('https://www.skillsoft.com/blog', "div", {"class": "entry-content"})
entries = skillsoft_articles[list_item]
description = entries.select_one("div p:nth-of-type(2)").text
return description
def opensesame_titles(list_item) -> str:
opensesame_articles = site_articles('https://www.opensesame.com/site/blog/', "div", {"class": "blog-post-right"})
entries = opensesame_articles[list_item]
title = entries.find('a').get_text()
return title
def opensesame_link(list_item) -> str:
opensesame_articles = site_articles('https://www.opensesame.com/site/blog/', "div", {"class": "blog-post-right"})
entries = opensesame_articles[list_item]
link = entries.find('a').get('href')
return link
def opensesame_description(list_item):
opensesame_articles = site_articles('https://www.opensesame.com/site/blog/', "section", {"class": "entry-content"})
entries = opensesame_articles[list_item]
description = entries.find('p').text
return description
def cornerstone_titles(list_item) -> str:
cornerstone_articles = site_articles('https://www.cornerstoneondemand.com/rework', "h2", {"class": "text-blue"})
entries = cornerstone_articles[list_item]
title = entries.find('a').get_text()
return title
def cornerstone_link(list_item) -> str:
cornerstone_articles = site_articles('https://www.cornerstoneondemand.com/rework', "h2", {"class": "text-blue"})
entries = cornerstone_articles[list_item]
link = entries.find('a').get('href')
return link
def cornerstone_description(list_item) -> str:
cornerstone_articles = site_articles('https://www.cornerstoneondemand.com/rework', "div", {"class": "col3-teaser-cont"})
entries = cornerstone_articles[list_item]
description = entries.find('p').text
return description
def print_values(list_item, title_func, link_func, desc_func):
return (print('Title:', title_func(list_item), '\n' 'Link:', link_func(list_item), '\n' 'Description:', desc_func(list_item)))
这本身就可以很好地工作,在pycharm中,它完全返回我想要的东西。
Doc 2是我的烧瓶文档,代码是:
import blogscraper
from flask import Flask
app = Flask(__name__)
skillsoft_titles = blogscraper.skillsoft_titles
skillsoft_link = blogscraper.skillsoft_link
skillsoft_description = blogscraper.skillsoft_description
@app.route('/', methods = ['GET'])
def skillsoft():
output = blogscraper.print_values(1, skillsoft_titles, skillsoft_link, skillsoft_description)
return output
skillsoft()
app.debug = True
app.run()
app.run(debug = True)
这会产生错误。出于某种原因,这会产生“无”或“无”回报,对谷歌搜索无回报后,这对我来说毫无意义。非常感谢您提供任何帮助!
答案 0 :(得分:1)
您的print_values
函数返回print
的返回值-恰好是None
def print_values(list_item, title_func, link_func, desc_func):
return (print('Title:', title_func(list_item), '\n' 'Link:', link_func(list_item)
您需要更改此方法以返回要返回的内容。
像这样:
def print_values(list_item, title_func, link_func, desc_func):
return 'Title:' + title_func(list_item) + '\n' + 'Link:' + link_func(list_item)
答案 1 :(得分:0)
您的print_values
函数返回print(...)
的结果None
。这就是烧瓶所抱怨的。
如果仅删除print
语句,将返回一个元组:
'Title:', title_func(list_item), ...
是一个元组,因为用逗号分隔的多个值是Python中的一个元组。
如果flask函数返回一个元组,则flask假定它是一个包含某些元素的元组,例如(响应,状态),请参见about responses
您的函数应返回例如字符串,就像这样:
def print_values(list_item, title_func, link_func, desc_func):
value = ''.join(('Title: ', title_func(list_item), '<br>\n',
'Link: <a href=', link_func(list_item), '>',
link_func(list_item), '</a><br>\n',
'Description: ', desc_func(list_item)))
# print (value)
return value
或列表也应该有效,具体取决于您要实现的目标。
请注意,将所有答案打印出控制台并不是一个好的解决方案,因此您可能希望在理解所有打印方式后将其删除...