Python-分页循环

时间:2019-02-06 02:20:21

标签: python loops pagination

我有一个问题。 在我的代码中,我有:

r = session.get("https://xxxxxxx.com/online/GIRL")
print (r.status_code)
print (r.cookies)
soups = BeautifulSoup(r.content, 'html5lib')


def getPeopleLinks(page):
    links = []
    for link in soups.find_all('a'):
        url = link.get('href')
        if url:
            if 'profile/' in url:
                links.append(url)
    return links

我如何获取所有可用页面上所有配置文件的列表? (例如1 2 3 4 5 6等)? 并将其放入Links []

该网络代码为:

<div class="pages"><span>1</span>
<a href="online/GIRL/2">2</a>
<a href="online/GIRL/3">3</a>
<a href="online/GIRL/4">4</a>
<a accesskey="x" href="online/GIRL/2">Next</a></div>

谢谢!

已添加: 感谢您的回答。其他分页站点与主站点具有相同的html,因此我只需要读取所有分页中的所有用户(2、3、4、5等) 对于我的主站点,一切正常,我只需要将所有分页站点的所有用户添加到LINKS []

login_data = {
    'login': 'xxxxx',
    'pass': 'xxxx',
    'back_url': '' 
    }

def getPeopleLinks(page):
    links = []
    for link in soups.find_all('a'):
        url = link.get('href')
        if url:
            if 'profil/' in url:
                links.append(url)
    return links

with requests.Session() as session:
        url = "https://xxxxx.com/login/?form_login=1"

        post = session.post(url, data=login_data, headers=headers)
print (post.status_code)
print (post.cookies)
r = session.get("https://xxxx.com/online/Girls")
print (r.status_code)
print (r.cookies)
soups = BeautifulSoup(r.content, 'html5lib')
x = getPeopleLinks(soups)
print(x)
for path in x:
    sleep(3)
    url = 'http://www.xxxx.com' + path
    page = urllib.request.urlopen(url)
    print(url)

输出为: http://www.xxxx.com/profile/nickname

一切正常,但仅适用于:

https://xxxx.com/online/Girls

我需要阅读所有分页网站上的所有用户

谢谢:)

0 个答案:

没有答案