Python3.7嵌套循环无法正确迭代

时间:2019-01-14 12:08:32

标签: python python-3.x loops selenium download

我有一个使用selenium的python3脚本,它从网站(http://digesto.asamblea.gob.ni/consultas/coleccion/)上刮取到旧报纸的链接,但是,我的嵌套循环无法正常工作。

它将遍历所有行以收集发布日期,以命名相应的PDF,例如(Gaceta_Oficial_Date.pdf),并遍历列表中的所有链接,而不是遍历所有链接并命名相同。因此,第一个循环不会重复。

但是如何使两个循环一起“工作”,以使一个循环获得第一个循环的结果?

这是脚本的相关部分:

import os
import datetime

new_links = ['http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=vPjrUnz0wbA%3D',
'http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=dsyx6l1Fbig%3D',
'http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=Cb64W7EHlD8%3D',
'http://digesto.asamblea.gob.ni/consultas/util/pdf.php?type=rdd&rdd=A4TKEG9x4F8%3D'] # only 4 links in list for simplification

table_id = driver.find_element(By.ID, 'tableDocCollection')
rows = table_id.find_elements_by_css_selector("tbody tr") # get all table rows

title = "Gaceta_Oficial_"
extension = ".pdf"
for row in rows:
    col = row.find_elements(By.TAG_NAME, "td")[2]
    date = col.text
    print(date)
    date = datetime.datetime.strptime(date, '%d/%m/%Y').strftime('%Y%m%d')
    filename = title + str(date) + extension
    print(filename)
    for new_link in new_links:
        print("Downloading %s" % filename)
        r = requests.get(new_link)
        open(os.path.expanduser("~/Downloads/" + filename, 'wb').write(r.content)

1 个答案:

答案 0 :(得分:0)

此行的zip问题已解决:

for row, new_link in zip(rows, new_links):