打印Web爬虫输出

时间:2015-09-11 16:12:52

标签: python python-2.7 flask web-crawler

我想使用Flask在Python中为Web爬虫I wrote创建一个Web界面。我无法打印结果,所以想要显示一个列表。如何在for页面中打印login.html循环的结果?

from flask import Flask, render_template, redirect, url_for, request

from bs4 import BeautifulSoup, SoupStrainer

import urllib2

import re

from flask import jsonify

app = Flask(__name__)

@app.route('/login', methods=['GET', 'POST'])

def login():
    url = "example.com"

    url_list = ["example.com/1", "example.com/2"]
    found_list = []

    if request.method == 'POST':
        if request.form['inpur_url'] != 'example.com':
            error = 'Invalid Credentials. Please try again.'
        else:
            for line1 in url_list:
                 #print "Crawled" " " + line1
                 try:
                     html_page = urllib2.urlopen(line1)
                     soup = BeautifulSoup(html_page)
                     link = soup.findAll(href=True)
                 except urllib2.HTTPError:
                   pass
                 for link1 in link:
                     url1 = link1.get("href")
                     if url in url1:
                         found_list.append(url)
                 return jsonify(found_list)     

    #return render_template('login.html', error=error)
    return jsonify(found_list)     

if __name__ == '__main__':
    app.run(debug=True)

1 个答案:

答案 0 :(得分:0)

我不确定您的整体设计决策,但我非常确定您的功能中的逻辑实际上并不是要在网页中找到链接。这将利用urllib2和BeautifulSoup来获取链接列表:

qadd