selenium,webdriver.page_source点击后没有刷新

时间:2017-06-28 20:48:44

标签: python selenium web

我正在尝试将网页的给定社区服务的地址列表复制到新文档中,以便我可以对地图中的所有位置进行地理编码。我没有能够获得所有包裹的清单,而是一次只能下载一个包裹,而且有25个包裹数量仅限于一个页面。因此,这将非常耗时。

我想开发一个脚本来查看页面源(包括表格标签中包含的25个地址)点击下一页按钮,复制下一页,依此类推,直到达到最大页面。之后,我可以将文本格式化为地理编码兼容。

下面的代码完成了所有这些,只是它反复复制第一页,即使我可以清楚地看到程序已成功导航到下一页:

# Open chrome
br = webdriver.Chrome()

raw_input("Navigate to web page. Press enter when done: ")

pg_src = br.page_source.encode("utf") 
soup = BeautifulSoup(pg_src)

max_page = 122 #int(max_page)

#open a text doc to write the results to

f = open(r'C:\Geocoding\results.txt', 'w')

# write results page by page until max page number is reached

pg_cnt = 1 # start on 1 as we should already have the first page
while pg_cnt < max_page:
    tble_elems = soup.findAll('table')
    soup = BeautifulSoup(str(tble_elems))
    f.write(str(soup))
    time.sleep(5)
    pg_cnt +=1
    # clicks the next button
    br.find_element_by_xpath("//div[@class='next button']").click()
    # give some time for the page to load
    time.sleep(5)
    # get the new page source (THIS IS THE PART THAT DOESN'T SEEM TO BE WORKING)
    page_src = br.page_source.encode("utf")
    soup = BeautifulSoup(pg_src)

f.close()

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题。 我认为这个问题是因为一些javascripts没有完全加载。 所有你需要的是等到对象加载。下面的代码为我工作

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
        delay = 10 # seconds
        try:
            myElem = WebDriverWait(drivr, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'legal-attribute-row')))
        except :
            print ("Loading took too much time!")