我想访问在具有未知数量的表的网页中迭代的所有表。我写了这段代码
import java.io.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTable;
import com.gargoylesoftware.htmlunit.html.HtmlTableRow;
import com.gargoylesoftware.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.WebClient;
public class test {
public static void main(String[] args) throws Exception {
WebClient client = new WebClient();
HtmlPage currentPage = client.getPage("http://www.mysite.com");
client.waitForBackgroundJavaScript(10000);
FileWriter fstream = new FileWriter("index.txt");
BufferedWriter out = new BufferedWriter(fstream);
for (int i=0;i<2;i++){
final HtmlTable table =(HtmlTable) currentPage.getByXPath("//table").get(i);
for (final HtmlTableRow row : table.getRows()) {
for (final HtmlTableCell cell : row.getCells()) {
out.write(cell.asText()+',');
}
out.write('\n');
}
}
out.close();
client.closeAllWindows();
}
}
我试过检查条件:
while(currentPage.getByXPath("//table")){....}
但不接受。什么是正确的检查条件?
答案 0 :(得分:2)
htmlunit.html.HtmlPage有一个方法 getElementsByTagName(String tagName)
您可以在其中传递“table”的tagName。然后在粗糙的伪代码中获取它返回的长度:
var x = getElementsByTagName("table");
var nTables = x.length
答案 1 :(得分:-1)
首先获取HTML Table Rows列表。然后获取HTMLTable列的列表并使用for循环和迭代表。并检查条件你想要什么
List<HtmlTableRow> tableRows = table.getRows();
我得到零行,因为我想检查表头,如果你想要你可以随意改变
List<HtmlTableCell> tableColumns = table.getRow(0).getCells();
for (int row = 0; row < tableRows.size(); row++)
{
for (int column = 0; column < tableColumns.size(); column++)
{
if (tableColumns.get(column).asText().equalsIgnoreCase("check your condition"))
{
// do what you want
}
}
}