将列表引用到for循环

时间:2017-01-22 20:30:10

标签: python for-loop extract

我有一个链接列表,每个链接都包含多个页面。我找到了每个子类别中的页数,但现在我想做一个循环,以便迭代到子链接的所有页面。所以链接的第一类有8页,第二条链有6页,依此类推。

lists = [8, 6, 5, 13, 10, 16, 13, 15, 4, 4, 5, 7, 2, 6, 6, 8, 9, 8, 3, 8, 8, 1, 6, 3, 2, 15, 5, 4, 2, 12, 18, 5, 2]

import bs4 as bs
import urllib.request
import pandas as pd
import urllib.parse
import re


#source = urllib.request.urlopen('https://messageboards.webmd.com/').read()
source = urllib.request.urlopen('https://messageboards.webmd.com').read()
soup = bs.BeautifulSoup(source,'lxml')


df = pd.DataFrame(columns = ['link'],data=[url.a.get('href') for url in soup.find_all('div',class_="link")])
lists =[]
lists2=[]
lists3=[]
page_links = []


for i in range(0,33):
    link = (df.link.iloc[i])
    req = urllib.request.Request(link)
    resp = urllib.request.urlopen(req)
    respData = resp.read()
    temp1=re.findall(r'Filter by</span>(.*?)data-pagedcontenturl',str(respData))
    temp1=re.findall(r'data-totalitems=(.*?)data-pagekey',str(temp1))[0]
    pageunm=round(int(re.sub("[^0-9]","",temp1))/10)
    lists.append(pageunm)


for j in lists:
    for y in range(1, j+1):
        url_pages = link + '#pi157388622=' + str(j)
        page_links.append(url_pages)

1 个答案:

答案 0 :(得分:1)

使用嵌套循环:

for i in lists:  # [8, 6, 5, etc]
    # now use i for the inner loop
    for j in range(1, i+1):  # [1-8], [1-6], [1-5], etc
        url_pages = link + '#pi157388622=' + str(j)
        # do sth with url_pages, or it'll be just overwritten each iteration