Question

基本上我想要其他页面从第一页到最后一页的所有信息，我尝试：

该网站有点奇怪..我想在“POST ISSUANCE”下获取所有发行者和其他信息

driver.get('https://www.chinabondconnect.com/en/Primary/Primary-Information/Onshore.html')
wait = WebDriverWait(driver, 30)
driver.find_element_by_link_text('Others').click()
for i in range(1,20):
        pg = "tb2tr pg" + str(i)
        allitems = driver.find_element_by_xpath('//*[@id="td7"]/tbody/tr[@class=pg])')
        for i in range(len(allitems)):
            issuer = driver.find_element_by_xpath('(//tr[@class=pg]//td[1]//div[2]//div)').text
            print(issuer)

它说不是一个有效的 xpath..

有人可以帮忙吗？

谢谢！！

Answer 1

"//table[@id='tb7']/tbody//tr[starts-with(@class,'{}')]".format(pg)

尝试将此 xpath 用于所有项目。使用“tb2tr pg”+ str(i) 值获取 td7 中的所有 tr 值。

你可以使用

for item in allitems:
    issuer = item.find_element_by_xpath('./td[1]/div[2]/div').get_attribute('textContent'
    print(issuer)

Answer 2

使用find_elements()获取所有记录，使用get_attribute("textContent")获取隐藏节点值。

for item in driver.find_elements_by_xpath("//table[@id='tb7']//tr[starts-with(@class,'tb2tr pg')]//td[1]/div[2]/div"):
    print(item.get_attribute("textContent"))

输出：

Central Huijin Investment Ltd.
Dongguan Rural Commercial Bank Co., Ltd.
Gemdale (Group) Co., Ltd.
Everbright Securities
China securities co ltd
Bank of China 
Jinan Rail Transit Group Co., Ltd.
Ping An Bank Co., Ltd.
Shaanxi Financial Holding Group Co., Ltd.
Bank of Suzhou Co., Ltd.
Chongqing Expressway Group Co., Ltd.
Shanghai World Expo Land Holdings Co., Ltd.
Beijing Capital Tourism Group Co., Ltd.
CMB Financial Leasing Co., Ltd.
Shaanxi Coal Industry Chemical Group Co., Ltd.
China Securities Co., Ltd.
Guangdong Electric Power Development Co., Ltd.
China Construction Bank 
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China Limited
China Securities Co., Ltd.
China Securities Co., Ltd.
China Bohai Bank
Shangrao Investment Holding Group SCP
China Securities Co., Ltd
Everbright Securities
Guangzhou Kaide Renewable Publicly Issued Corporate Bond
SCP/Guangzhou Development Zone Business Development Group
Qingdao City Investment Financial Holding Group Renewable Publicly Issued Corporate Bond
China Railway Construction Investment Group MTN
Qingdao Guoxin Development (Group) Co., Ltd.
China Securities Co., Ltd.
China Orient Asset Management Co., Ltd
    Datang International Power Generation Co.,Ltd.
Bank of China
Bank of China 
Datang International Power Generation Co.,Ltd. 
Hangzhou City Construction Investment Group Limited
YIBIN STATE OWNED ASSETS MANAGEMENT CO.,LTD.
China Railway Construction Investment Corporation
ABC Financial Leasing
Guangzhou Metro
Aluminum Corporation of China Limited
Fubon Bank
China Securities Co., Ltd.
Ganzhou Development Investment Holding Group
Shanghai rural Commercial Bank
Everbright Securities
ICBC Financial Leasing Co., Ltd
Shanghai Pudong Development Bank
China State Railway Group Co., Ltd.
China State Railway Group Co., Ltd.
CMB Financial Leasing
CMB Financial Leasing Co., Ltd.
Bank of China
Bank of China 
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China Limited
Industrial and Commercial Bank of China Limited
Bank of Communications Co.,Ltd.
Zhejiang State-owned Capital Operation Co., Ltd.
China Merchant Bank
China Merchants Bank
Bank of Communications Financial Leasing Co., Ltd.
CCB Financial Leasing Co., Ltd
Central Huijin Investment Ltd.
Central Huijin Investment Ltd.
China Securities Co., Ltd
Everbright Securities
Beijing Infrastructure Investment Co., LTD
Huishang Bank Corporation
Bank of Communication
China Nonferrous Metal Mining (Group) Co., Ltd
Everbright Securities
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China Limited
China Securities Co., Ltd
China Everbright Bank Co., Ltd
Bank of China...so on

Answer 3

尝试删除您的 xpath 的括号，这样您最终的 xpath 将如下所示：

issuer = driver.find_element_by_xpath('//tr[@class=pg]//td[1]//div[2]//div').text

Answer 4

如果我错了，请纠正我。我了解您想要抓取整个网页，这意味着当您单击时，该页面会加载一个新页面。 Selenium Web 驱动程序无法识别新页面，它专注于第一页。你必须给它指示这样做。解决这个问题的方法是：

from selenium.webdriver.support import expected_conditions as EC

# Start the driver
with webdriver.Firefox() as driver:
    # Open URL
    driver.get("https://seleniumhq.github.io")

    # Setup wait for later
    wait = WebDriverWait(driver, 10)

    # Store the ID of the original window
    original_window = driver.current_window_handle

    # Check we don't have other windows open already
    assert len(driver.window_handles) == 1

    # Click the link which opens in a new window
    driver.find_element(By.LINK_TEXT, "new window").click()

    # Wait for the new window or tab
    wait.until(EC.number_of_windows_to_be(2))

    # Loop through until we find a new window handle
    for window_handle in driver.window_handles:
        if window_handle != original_window:
            driver.switch_to.window(window_handle)
            break

    # Wait for the new tab to finish loading content

Python Selenium 使用 For 循环访问元素

4 个答案: