我正在尝试从HTML表中提取一些信息,并将它们放到arraylist = new ArrayList<HashMap<String, String>>();
中,以便在我的应用内更好地管理。
在发布请求后,我已经能够在document
变量中保存正确的HTML页面。
以下是包含我的有用数据的HTML,但它不是页面中唯一的表。我不知道如何在这个特定的表格中找到项目。
以这种格式获取数据的正确方法是什么:DAY - TIME - SUGGESTION
?
非常感谢您提前提出任何建议!
<table><tbody>
<tr><th class="date">Wed, 14 Sep 2016</th><th></th><th></th></tr>
<tr><td> </td><td class="sub">09:00</td><td class="sugg">Depart and set your watch to the arrival city's time zone (03:00). Sleep as needed. The following times are in the arrival city's time zone.</td></tr>
<tr><td> </td><td class="sub">18:30</td><td class="sugg">Arrive</td></tr>
<tr><td> </td><td class="sub">19:00–22:00</td><td class="sugg">Seek light</td></tr>
<tr><td> </td><td class="sub">22:00–23:00</td><td class="sugg">Avoid light before bed</td></tr>
<tr><td> </td><td class="sub">23:00–07:00</td><td class="sugg">Sleep ideal</td></tr>
<tr><th class="date">Thu, 15 Sep 2016</th><th></th><th></th></tr>
<tr><td> </td><td class="sub">20:00–23:00</td><td class="sugg">Seek light before bed</td></tr>
<tr><td> </td><td class="sub">23:00–07:00</td><td class="sugg">Sleep ideal</td></tr>
<tr><th class="date">Fri, 16 Sep 2016</th><th></th><th></th></tr>
<tr><td> </td><td class="sub">20:00–23:00</td><td class="sugg">Seek light before bed</td></tr>
<tr><td> </td><td class="sub">23:00–07:00</td><td class="sugg">Sleep ideal</td></tr>
</tbody></table>
我认为循环是我想要实现的方式。我越来越接近解决方案了。我需要找到一种方法来检测我在循环中检查的当前行是否有th或td单元格:
//find the table, it is the second table in the HTML
Element table = document.select("tbody").get(1);
//get all the rows
Elements rows = table.select("tr");
//loop the rows
for (Element row : rows) {
//if the row contains th, I get the first cell and save day in a string
//if the row contains td, I get the second (time) and third (suggestion) cells and put in my map string with day, time, suggestion
}
答案 0 :(得分:1)
所以你有两个选择,你可以利用css选择器按类拉出所有元素。
或者你可以遍历元素。
Document doc = Jsoup.connect(url).get();
Element div = doc.select("tbody").first();
for (Element element : div.children()) {
//do stuff here
}
答案 1 :(得分:0)
嗯,我想出了一个解决方案,也许不是最好的样式编码,但它有效:)(工程师:“如果它有效,那就很好”)
我对某些语言的编码有一定的了解,但这是我第一次处理解析并因此处理JSoup。它不是一个理解的直接工具,但在我的研究中,我注意到它非常强大。我把它放在我个人的学习清单中。
注意:这种方法假设在td行之前总是存在第n行。
这是我的解决方案:
String day = null;
String time;
String sugg;
//crop the page in order to leave the table I needed, since it was without a specific id, I selected it as the second table in the page
Element table = document.select("tbody").get(1);
//this is the list of all the row in the table
Elements rows = table.select("tr");
//here I cycle the rows
for (Element row : rows) {
HashMap<String, String> map = new HashMap<String, String>();
//if the row contains th elements, I store the first th of the row as day
if (!row.select("th").isEmpty())
{
day = row.select("th").get(0).text();
}
//if the row contains td elements, I store the second and third td in strings and put all in map
if (!row.select("td").isEmpty())
{
time = row.select("td").get(1).text();
sugg = row.select("td").get(2).text();
Log.d("row: ", day + " " + time + " " + sugg);
map.put("day", day);
map.put("time", time);
map.put("sugg", sugg);
}
arraylist.add(map);
}