Question

我在py文件中编写了一个脚本，用于从网站https://www.dataquest.io/blog/抓取数据（仅出于学习目的），效果很好。根据需要废弃所有数据。

我在Django应用中将脚本文件作为ScrapperApp引入。我想将所有抓取的数据存储到数据库中。

运行服务器并执行脚本之后。该网页只是按原样加载。但是数据库仅显示网站中已抓取数据的单个实例。

scrape.py

from bs4 import BeautifulSoup
import requests

src = 'https://www.dataquest.io/blog/'

source = requests.get(src).text
soup = BeautifulSoup(source, 'lxml')

total = 0
for article in soup.find('section', class_='bSe right').find_all('article'):
     try:
          headline = article.find('h2', class_='entry-title').text
          print('Heading : ', headline)

          summary = article.find('p').text
          print('Summary: ',summary)

          category = article.find('div', class_='category').find('span').text
          print(category)

          blog_link = article.find('div', class_='awr').find('a', class_='psb')['href']
          print(blog_link)
          total += 1
     except exception as e:
          pass

     print()


print(total)

models.py

from django.db import models

class BlogCards(models.Model):
    headline = models.CharField(max_length=500, blank=True)
    category = models.CharField(max_length=300, blank=True)
    summary = models.TextField(blank=True)
    link = models.URLField(blank=True)
    total_blogs = models.IntegerField()

    def __str__(self):
        return self.headline

Django ScrapperApp的

views.py

from django.shortcuts import render
from bs4 import BeautifulSoup
import requests
from .models import BlogCards
import logging


def ScrapperView(request):
    src = 'https://www.dataquest.io/blog/'

    source = requests.get(src).text
    soup = BeautifulSoup(source, 'lxml')

    total_blogs = 0

    for article in soup.find('section', class_='bSe right').find_all('article'):
        blog_card = BlogCards()

        headline = article.find('h2', class_='entry-title').text
        blog_card.headline = headline

        summary = article.find('p').text
        blog_card.summary = summary

        category = article.find('div', class_='category').find('span').text
        blog_card.category = category

        blog_link = article.find('div', class_='awr').find('a', class_='psb')['href']
        blog_card.link = blog_link

        total_blogs += 1
        blog_card.total_blogs = total_blogs
    logging.basicConfig(filename='blogcards.log', level=logging.INFO)


        blog_card.save()

    return render(request, 'scrapper/scrapper.html')

该数据库仅显示已保存的抓取数据的一个实例例如标题，类别，摘要和仅第一个博客的链接。其余信息不会保存到数据库中。

我希望将所有已废弃的数据输出到数据库中。我知道我缺少一些东西，因为我是Django的新手。我无法对其进行配置。

谢谢

如何将所有报废的数据一一保存到数据库中

0 个答案: