在flask应用程序中使用requests_html

时间:2018-06-03 07:43:50

标签: multithreading python-3.x flask screen-scraping

我正在尝试从Flask应用程序中运行Python模块html.render()中的requests_html方法。但是,每当我的应用程序代码调用该函数时,我都会收到此错误:RuntimeError: There is no current event loop in thread 'Thread-1'.

以下是使用html.render模块的函数:

def extractor(url):
    session = HTMLSession()
    r = session.get(url)
    soup = bs4.BeautifulSoup(r.text)
    found = soup.find_all("a", href=privacy_regex)
    if found:
        print("Using Default Web Scraping bs4+regex")
        found = [tag['href'] for tag in found]
        uri = sorted(found, key=rank_url)[-1]
        return urljoin(url, uri)
    else:
        print('Using HTML Rendering')
        r.html.render()
        links = r.html.absolute_links
        privacy_links = [x for x in links if privacy_regex.search(x)]
        uri = sorted(privacy_links, key=rank_url)[-1]
        return urljoin(url, uri)

这是我的应用程序代码

@app.route('/api', methods=['POST', 'GET'])
def text_output():
        url = request.form['url_text']
        print(url)
        text, domain = url_input_parser(url)
        print(text, domain)

任何帮助表示赞赏!非常感谢!

0 个答案:

没有答案