从Beautifulsoup解析列表或对Flask的字典

时间:2016-04-27 06:32:14

标签: python flask beautifulsoup

我是蟒蛇,烧瓶和美味汤的新手。 所以这里是交易。我正在使用Beautifulsoup从网上抓取一些数据。

    from bs4 import BeautifulSoup
    import requests

    # PageURL's configure
    mainpage = 'http://www.myauto.ge/'
    pageurl = 'http://www.myauto.ge/?action=search&page='
    pagenum = 0

    # Looping Pages. Seems Wrong but doing its job?
    for x in range(0, 2):
        pagenum += 1
        r = requests.get(pageurl + str(pagenum))
        soup = BeautifulSoup(r.content, 'html.parser')

        for cars in soup.find_all('div', {'class': 'car-info-wrapper'}):

            cname = cars.find("div", {"class": "car-name-wrapper"}).find('a').get_text()
            cyear = cars.find("p", {"class": "cr-levy car-year"}).get_text()
            ceng = cars.find("div", {"class": "cr-det-in cr-engine"}).p.get_text()
            cengroad = cars.find("div", {"class": "cr-det-in cr-road"}).p.get_text()
            # clink = cars.find('a').get('href')

当我打印cname,cyear,ceng和cengroad时,它的工作完全像我想要的那样。但现在我想在烧瓶中做这个。而不是在sqlite3中创建数据库,我希望它简单地刮取数据并将其解析为index.html。

这是我的app.py烧瓶代码。

# Import
from flask import Flask, render_template
import requests
from bs4 import BeautifulSoup


app = Flask(__name__)

# mainpage = 'http://www.myauto.ge/'
pageurl = 'http://www.myauto.ge/?action=search&page='
# pagenum = 0



# Our index
@app.route('/')
@app.route('/index')
def index():
    # for x in range(0, 2):
    #     pagenum += 1
    r = requests.get(pageurl)
    soup = BeautifulSoup(r.content, 'html.parser')

    data = []
    for cars in soup.find_all('div', {'class': 'car-info-wrapper'}):
        cname = cars.find("div", {"class": "car-name-wrapper"}).find('a').get_text()

        data.append(cname)

    datayear =[]
    for cars in soup.find_all('div', {'class': 'car-info-wrapper'}):   
        cyear = cars.find("p", {"class": "cr-levy car-year"}).get_text()
        datayear.append(cyear)


    return render_template("index.html", data=data,datayear=datayear)




if __name__ == '__main__':
    app.run(debug=True)

这是我的index.html

{% extends "base.html" %}
{% block body %}
<table class="table">
    <thead>
      <tr>
        <th>Car</th>
        <th>Year</th>
        <th>Engine</th>
        <th>Road so far</th>
      </tr>
    </thead>
    <tbody>

   <tr>
     <td> {{ data }} </td>
     <td> {{ datayear }} </td>
   </tr>


    </tbody>
  </table>

{% endblock %}

and this is what i get

如果尝试

  <tr>
   {% for x in data %}
     <td> {{ x }} </td>
     <td>         </td>
   </tr>

I get what i want but only for Car name

所以如何与Car year相同

   <tr>
   {% for x in data %}
     <td> {{ carname }} </td>
     <td> {{ caryear }} </td>
   </tr>

或做类似的事情然后拆分列表?

data = []
for cars in soup.find_all('div', {'class': 'car-info-wrapper'}):
    cname = cars.find("div", {"class": "car-name-wrapper"}).find('a').get_text()
    cyear = cars.find("p", {"class": "cr-levy car-year"}).get_text()

    data.append(cname)
    data.append(cyear)

或者我应该尝试没有列表和字典吗?我只是不想使用db。

感谢阅读。

2 个答案:

答案 0 :(得分:0)

你几乎就在那里,但不是在视图代码中循环两次,只需循环一次并创建汽车数据字典并将其添加到列表中,如下所示:

data = []
for cars in soup.find_all('div', {'class': 'car-info-wrapper'}):
    car_info = {} # Start with an empty dictionary for each car.

    car_info['name'] = cars.find("div", {"class": "car-name-wrapper"}).find('a').get_text()
    car_info['year'] = cars.find("p", {"class": "cr-levy car-year"}).get_text()
    car_info['engine'] = cars.find("div", {"class": "cr-det-in cr-engine"}).p.get_text()
    car_info['mileage'] = cars.find("div", {"class": "cr-det-in cr-road"}).p.get_text()

    data.append(car_info)

return render_template("index.html", data=data)

然后在你的模板中:

{% extends "base.html" %}
{% block body %}
<table class="table">
    <thead>
      <tr>
        <th>Car</th>
        <th>Year</th>
        <th>Engine</th>
        <th>Road so far</th>
      </tr>
    </thead>
    <tbody>

   {% for car_info in data %}
   <tr>
     <td> {{ car_info['name'] }} </td>
     <td> {{ car_info['year'] }} </td>
     <td> {{ car_info['engine'] }} </td>
     <td> {{ car_info['mileage'] }} </td>
   </tr>
   {% endfor %}

    </tbody>
  </table>

{% endblock %}

答案 1 :(得分:0)

第二种方法应该有效,但您需要以某种方式将数据分组。

例如,尝试将它们放在元组中,而不是两行。

data.append((came, cyear)) 

然后在模板中,您可以提取出这些值。

{% for x in data %}
<tr>
  <td> {{ x[0] }} </td>
  <td> {{ x[1] }} </td>
</tr>
{% endfor %}

使用汽车的字典或对象类将是更好,更详细的方法,但这应该适用于这个简单的例子