JSoup - 提取表数据错误

时间:2016-07-03 19:26:19

标签: java html arrays parsing jsoup

我刚刚开始了一个小项目,我需要收集有关全球货币对的历史数据。根据此问题Extract Data out of table with JSoup的回答,我将代码粘贴在下方。

但是,我仍然会收到IndexOutOfBoundException,但数据'元素数组大小为7?

我已经抓了近一个小时,如果有人能指出我哪里出错了,我会很感激!

主要类

import java.util.ArrayList;
import java.util.List;
import java.io.IOException;

import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class MainClass {


public static void main(String[] args) throws IOException{

    Document doc = Jsoup.connect("http://www.myfxbook.com/forex-market/currencies/GBPUSD-historical-data").get();

    Element table = doc.getElementById("symbolMarket");

    List<Entry> entries = new ArrayList<Entry>();

    for(Element row : table.select("tr")){

        int index = 0;
        Entry tableEntry = new Entry();
        Elements data = row.select("td");

        tableEntry.setDate(data.get(index++).text());
        tableEntry.setOpen(data.get(index++).text());
        tableEntry.setHigh(data.get(index++).text());
        tableEntry.setLow(data.get(index++).text());
        tableEntry.setClose(data.get(index++).text());
        tableEntry.setChangePips(data.get(index++).text());
        tableEntry.setChangePercent(data.get(index++).text());

        entries.add(tableEntry);

    }

}

}

参赛作品类

public class Entry {

private String date;
private String open;
private String high;
private String low;
private String close;
private String changePips;
private String changePercent;

public String getDate() {
    return date;
}
public void setDate(String date) {
    this.date = date;
}
public String getOpen() {
    return open;
}
public void setOpen(String open) {
    this.open = open;
}
public String getHigh() {
    return high;
}
public void setHigh(String high) {
    this.high = high;
}
public String getLow() {
    return low;
}
public void setLow(String low) {
    this.low = low;
}
public String getClose() {
    return close;
}
public void setClose(String close) {
    this.close = close;
}
public String getChangePips() {
    return changePips;
}
public void setChangePips(String changePips) {
    this.changePips = changePips;
}
public String getChangePercent() {
    return changePercent;
}
public void setChangePercent(String changePercent) {
    this.changePercent = changePercent;
}



}

2 个答案:

答案 0 :(得分:1)

您正在尝试从表格标题中获取数据...您必须跳过它。

public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("http://www.myfxbook.com/forex-market/currencies/GBPUSD-historical-data").get();

        Element table = doc.getElementById("symbolMarket");

        List<Entry> entries = new ArrayList<Entry>();

        Elements elements = table.select("tr");
        Iterator<Element> itr = elements.iterator();
        itr.next(); //skip header data

        while ( itr.hasNext() ) {
            int index = 0;
            Entry tableEntry = new Entry();
            Elements data = itr.next().select("td");

            tableEntry.setDate(data.get(index++).text());
            tableEntry.setOpen(data.get(index++).text());
            tableEntry.setHigh(data.get(index++).text());
            tableEntry.setLow(data.get(index++).text());
            tableEntry.setClose(data.get(index++).text());
            tableEntry.setChangePips(data.get(index++).text());
            tableEntry.setChangePercent(data.get(index++).text());
            entries.add(tableEntry);

        }       




    }

答案 1 :(得分:0)

  

但是,我仍然会收到IndexOutOfBoundException,但数据&#39;元素数组大小为7?

如果这是真的,你就不会看到这个例外。

问题是,第一行没有任何td,但th(表格标题),因此对于此行row.select("td")0个匹配的元素td选择器,您可以通过异常信息

获知
  

java.lang.IndexOutOfBoundsException:Index:0,Size:0

要解决此问题,您只需忽略第一行,或明确选择tr has至少一个td作为子元素

for(Element row : table.select("tr:has(td)")){
    //                            ^^^^^^^^
    ...
}

您还可以在对其应用任何操作之前手动测试data存储td的大小

for(Element row : table.select("tr")){
    Elements data = row.select("td");

    if(data.size()==7){

        int index = 0;
        Entry tableEntry = new Entry();

        tableEntry.setDate(data.get(index++).text());
        tableEntry.setOpen(data.get(index++).text());
        tableEntry.setHigh(data.get(index++).text());
        tableEntry.setLow(data.get(index++).text());
        tableEntry.setClose(data.get(index++).text());
        tableEntry.setChangePips(data.get(index++).text());
        tableEntry.setChangePercent(data.get(index++).text());

        entries.add(tableEntry);
    }
}