Question

我目前正在学习使用python，而我的第一个目标是在我网站的每个页面中删除每篇文章。

我想在第一页中找到所有文章，并在每篇文章中删除标题，当它完成时，我想进入下一页并做同样的事情。

实际上我可以废弃所有链接，第一页和多进程无限期地获取第一页的链接。我不知道如何在同一时间废弃所有链接的标题，并在网站的每个页面中删除链接。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from lxml import html
import requests
import multiprocessing
import concurrent.futures

i = 0

def get_informations(link):
    page = requests.get('http://myWebsite.com/'+link)
    tree = html.fromstring(page.text)
    titre = tree.xpath('//*[@id="infosSpectacle"]/ul/li[1]/h2/text()')
    print titre

while True:
    page = requests.get('http://myWebsite.com/Articles?Page='+str(i))
    tree = html.fromstring(page.text)
    links = tree.xpath("//a/@href")
    links = set(links)

    executor = concurrent.futures.ProcessPoolExecutor(10)
    futures = [executor.submit(get_informations, link) for link in links]
    concurrent.futures.wait(futures)
    i =+ 1    
    #How can I go to the second page with the Process ??

感谢您的帮助

Python进程与多进程

0 个答案: