Question

首先我是新手，我还是python和Django的新手，我刚刚尝试抓取我的数据并将其存储在我的数据库中。我有点工作了。它只刮擦一个对象并存储它。不是多个对象。现在我正在本地开发并试图解决它。我有这样的视图

def practice(request):
    world = get_world_too()

    for entry in world:
        post = Post()
        post.title = entry['text']
        post.image_url = entry['src']
        post.save()

        template = "blog/post/noindex.html"
        context = {
        }
        return render(request, template, context)

这是函数

        def get_world_too():
        url = 'http://www.example.org'
        html = requests.get(url, headers=headers)
        soup = BeautifulSoup(html.text, 'html5lib')

        titles = soup.find_all('section', 'box')[:9]
        entries = [{'href': url + box.a.get('href'),
                    'src': box.img.get('src'),
                    'text': box.strong.a.text,
                    } for box in titles]
        return entries

它只会刮擦并存储多个对象如果我刷新页面。但是我的函数设置为9，所以我认为这至少会在我的数据库中存储9个对象。在视图中我有一个循环

for entry in world:
        post = Post()
        post.title = entry['text']
        post.image_url = entry['src']
        post.save()

所以循环不应该获得所有九个对象？另外，我不是这不是专业的方法。正如我所说，我只是在练习它。最终，我希望将其设置为heroku cron作业，以便在一天中运行几次。但就目前而言，如何在一次拍摄中抓取多个对象并将其保存到我的数据库中。

Answer 1

由于这一行，它只进行一次循环：

return render(request, template, context)

return在第一次运行时（在第一个循环中）完全返回整个函数。如果你想通过每个循环然后返回，请将return移出循环，如：

def practice(request):
    world = get_world_too()

    for entry in world:
        post = Post()
        post.title = entry['text']
        post.image_url = entry['src']
        post.save()

        template = "blog/post/noindex.html"
        context = {}

    # not in the loop anymore
    return render(request, template, context)

刚刚学会了抓取数据并将其存储在我的数据库中，但它只会抓取一个对象

1 个答案: