我有一个问题。 在我的代码中,我有:
r = session.get("https://xxxxxxx.com/online/GIRL")
print (r.status_code)
print (r.cookies)
soups = BeautifulSoup(r.content, 'html5lib')
def getPeopleLinks(page):
links = []
for link in soups.find_all('a'):
url = link.get('href')
if url:
if 'profile/' in url:
links.append(url)
return links
我如何获取所有可用页面上所有配置文件的列表? (例如1 2 3 4 5 6等)? 并将其放入Links []
该网络代码为:
<div class="pages"><span>1</span>
<a href="online/GIRL/2">2</a>
<a href="online/GIRL/3">3</a>
<a href="online/GIRL/4">4</a>
<a accesskey="x" href="online/GIRL/2">Next</a></div>
谢谢!
已添加: 感谢您的回答。其他分页站点与主站点具有相同的html,因此我只需要读取所有分页中的所有用户(2、3、4、5等) 对于我的主站点,一切正常,我只需要将所有分页站点的所有用户添加到LINKS []
login_data = {
'login': 'xxxxx',
'pass': 'xxxx',
'back_url': ''
}
def getPeopleLinks(page):
links = []
for link in soups.find_all('a'):
url = link.get('href')
if url:
if 'profil/' in url:
links.append(url)
return links
with requests.Session() as session:
url = "https://xxxxx.com/login/?form_login=1"
post = session.post(url, data=login_data, headers=headers)
print (post.status_code)
print (post.cookies)
r = session.get("https://xxxx.com/online/Girls")
print (r.status_code)
print (r.cookies)
soups = BeautifulSoup(r.content, 'html5lib')
x = getPeopleLinks(soups)
print(x)
for path in x:
sleep(3)
url = 'http://www.xxxx.com' + path
page = urllib.request.urlopen(url)
print(url)
输出为: http://www.xxxx.com/profile/nickname
一切正常,但仅适用于:
https://xxxx.com/online/Girls
我需要阅读所有分页网站上的所有用户
谢谢:)