我正在尝试迭代一个列表,以获得一个网站的链接,该网站的子类别各有多个页面。子类别中的第一个链接具有列表中的第一个数字(8),第二个链接具有6个,依此类推。我的最终结果想看起来像这样:
sublinks:
0 https://messageboards.webmd.com/family-pregnancy/f/relationships/
1 https://messageboards.webmd.com/family-pregnancy/f/parenting/
2 https://messageboards.webmd.com/family-pregnancy/f/pets/
3 https://messageboards.webmd.com/family-pregnancy/f/pregnancy/
尝试迭代for循环的列表:[8,6,5,13,10,16,13,15,4,4,5,7,2,6,6,8,9,8,3 ,8,8,1,6,3,2,15,5,4,2,12,18,5,2]
import bs4 as bs
import urllib.request
import pandas as pd
import urllib.parse
import re
#source = urllib.request.urlopen('https://messageboards.webmd.com/').read()
source = urllib.request.urlopen('https://messageboards.webmd.com').read()
soup = bs.BeautifulSoup(source,'lxml')
df = pd.DataFrame(columns = ['link'],data=[url.a.get('href') for url in soup.find_all('div',class_="link")])
lists =[]
lists2=[]
lists3=[]
page_links = []
for i in range(0,33):
link = (df.link.iloc[i])
req = urllib.request.Request(link)
resp = urllib.request.urlopen(req)
respData = resp.read()
temp1=re.findall(r'Filter by</span>(.*?)data-pagedcontenturl',str(respData))
temp1=re.findall(r'data-totalitems=(.*?)data-pagekey',str(temp1))[0]
pageunm=round(int(re.sub("[^0-9]","",temp1))/10)
lists.append(pageunm)
for j in lists:
for x in range(1, j+1):
url_pages = link + '#pi157388622=' + str(j)
page_links.append(url_pages)
第一次迭代的最终结果看起来像这样:
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=1
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=2
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=3
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=4
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=5
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=6
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=7
https://messageboards.webmd.com/family-pregnancy/f/relationships/#pi157388622=8
答案 0 :(得分:0)
如果你的问题是你不能让迭代从1开始到初始列表中的数字(在代码示例之外),那么你可以尝试这样的事情:
sub_links = [8, 6, 5, 13, 10, 16, 13, 15, 4, 4, 5, 7, 2, 6, 6, 8, 9, 8, 3, 8, 8, 1, 6, 3, 2, 15, 5, 4, 2, 12, 18, 5, 2]
for length in sub_links:
for number in range(1, length + 1):
print(number, end=' ')
print()
目前打印每行所需的所有数字。调整内部for循环体以将数字附加到您的链接上,然后您就可以拥有它。
如果您的问题与其他内容有关,那么您需要更清楚 - 我建议您只包含无效的代码并解释问题所在。