我在vba中编写了一些与selenium结合使用的代码来解析分散在多个页面中的不同表中的数据。当我运行我的脚本时,我可以看到它从第一页解析数据然后继续点击下一页按钮,直到没有更多按钮可用。但是,我从第一页获取数据并看到浏览器单击下一页按钮,因为它无法从其他页面获取任何数据。我不明白我在这里做错了什么。也许,我创建的循环与它有关或我不知道。谢谢你看看它。这是完整的代码:
Sub Table_data()
Dim driver As New ChromeDriver
Dim tabl As Object, rdata As Object, cdata As Object
driver.Get "https://toolkit.financialexpress.net/santanderam"
driver.Wait 1000
For Each tabl In driver.FindElementsByXPath("//table[@class='fe-datatable']")
For Each rdata In tabl.FindElementsByXPath(".//tr")
For Each cdata In rdata.FindElementsByXPath(".//td")
y = y + 1
Cells(x + 1, y) = cdata.Text
Next cdata
x = x + 1
y = 0
Next rdata
driver.FindElementByLinkText("Next").Click
driver.Wait 1000
Next tabl
End Sub
答案 0 :(得分:1)
我个人会改变你迭代页面的方式。在伪代码:
中应该是这样的function element getNextButton(){
all_buttons = driver.findElementsByXpath("""//*[@id="Price_1_1"]/tfoot/tr/td/div/div/a""");
next_button = all_buttons[all_buttons.Size()-1];
return next_button;
}
main(){
next_button = getNextButton();
while true{
do something with your current table;
next_button.click();
wait(2); // wait some time till the page loads
next_button = getNextButton();
if next_button.text does not contains 'Next'{
break;
}
}
}
我刚刚在Python上测试过它:
from selenium import webdriver
import time
def get_next_button():
buttons = driver.find_elements_by_xpath("""//*[@id="Price_1_1"]/tfoot/tr/td/div/div/a""")
next_element_button = buttons[len(buttons)-1]
return next_element_button
chrome_path = r"chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://toolkit.financialexpress.net/santanderam")
time.sleep(5)
next_button =get_next_button()
while(True):
# Do something with the table
next_button.click()
time.sleep(2)
next_button = get_next_button()
if 'Next' not in next_button.text:
break
print 'End'
我对vba不熟悉,但如果你不懂Python,我可以尝试将其翻译成vba。
修改强>
VBA解决方案的“近似”应该是这样(请检查语法错误,我从未使用过VBA):
Function GetNextElement() as Object
Dim all_buttons As Object
Dim next_button As Object
all_buttons= driver.FindElementsByXpath("""//*[@id="Price_1_1"]/tfoot/tr/td/div/div/a""")
next_button = all_buttons[all_buttons.Length-1]
Return next_button
End Function
Sub Table_data()
Dim driver As New ChromeDriver
Dim position as Integer
Dim next_button As Object
driver.Get "https://toolkit.financialexpress.net/santanderam"
driver.Wait 1000
next_button = GetNextElement()
Do While True
// Do something with the table
next_button.Click
driver.Wait 2000
next_button = GetNextElement()
position = InStr(next_button.Text,"Next")
If position = 0 Then
Exit Do
End If
Loop
End Sub
答案 1 :(得分:1)
考虑按下循环外的Next按钮。你应该在另一个循环中使用它,并且当没有按下Next按钮时循环应该终止(运行时错误7:NoSuchElementError)
Xpath //table[@class='fe-fund-tableBody']
也会返回页码。您应该按类名使用内部表//*[@id='docRows']
,或者按id (//table[@class='fe-fund-tableBody'])[1]
搜索。他们将指向相同的元素。
您可能已经注意到上述元素有7次出现。您的代码遍历每个页面的空代码。您可以通过循环显示第一次出现来避免这种情况,例如:(//*[@id='docRows'])[1]
或implicit/explicit wait
。
我还建议找到wait
代替Sub Table_data()
Dim driver As New ChromeDriver
Dim tabl As Object, rdata As Object, cdata As Object
driver.Get "https://toolkit.financialexpress.net/santanderam"
driver.Wait 1000
Do
For Each tabl In driver.FindElementsByXPath("(//*[@id='docRows'])[1]") 'or "(//table[@class='fe-fund-tableBody'])[1]"
For Each rdata In tabl.FindElementsByXPath(".//tr")
For Each cdata In rdata.FindElementsByXPath(".//td")
y = y + 1
Cells(x + 1, y) = cdata.Text
Next cdata
x = x + 1
y = 0
Next rdata
Next tabl
On Error Resume Next
driver.FindElementByLinkText("Next").Click
driver.Wait 1000
Loop Until Err.Number = 7
End Sub
的方法。如果我们不进一步改进其他任何事情,最后您的代码应如下所示:
[object Object]