如何将网址自动更改为“&page = 1”,“&page = 2”,“&page = 3”等

时间:2018-10-29 02:16:49

标签: python-3.x

因此,我正在尝试抓取具有以下网址系统的网站:

http://www.website.com/browse.php?cat=19&s_tag=1&page=0
http://www.website.com/browse.php?cat=19&s_tag=1&page=1
http://www.website.com/browse.php?cat=19&s_tag=1&page=2

我的意图是先刮一页,然后将URL更改为下一页,然后再刮,再下一页,依此类推。

我的错误脚本如下:

driver.get('http://www.website.com/browse.php?cat=19&s_tag=1&page=0')
while True:
    #code for scraping information

    #code to get to next page
    i=0
    while (f'http://www.website.com/browse.php?cat=19&s_tag=1&page={i}')
        i+=1
    driver.get(f'http://www.website.com/browse.php?cat=19&s_tag=1&page={i}')

有人知道我应该做什么吗?

例外是:

  File "<input>", line 45
    while (f'http://www.website.com/browse.php?cat=19&s_tag=1&page={i}')
                                                                       ^
SyntaxError: invalid syntax

我简化了整个脚本,并添加了打印字符串以查看脚本的挂起位置。

import time
from selenium import webdriver

driver=webdriver.Firefox()
driver.get('https://www.ozbargain.com.au/?page=0')
while True:
    print('sleeping for 5 secs')
    time.sleep(5)
    print('proceeding')

    #code to get to next page
    i=0
    print('i=0 added')
    while (f'https://www.ozbargain.com.au/?page={i}'):
        i+=1
    print('while loop finished')
    driver.get(f'https://www.ozbargain.com.au/?page={i}')
    print('end of loop')

我明白了:

sleeping for 5 secs
proceeding
i=0 added

所以很明显,嵌入的while循环是错误的。

1 个答案:

答案 0 :(得分:0)

您的while语句末尾缺少:, 应该是

while (f'http://www.website.com/browse.php?cat=19&s_tag=1&page={i}'):