使用jsoup我尝试使用选择器路径获取文本,但是位于路径中间的“table”元素就像它里面有0个元素一样

时间:2018-05-21 19:01:42

标签: java jsoup

这是我正在处理的网站:http://rozklady.mpk.krakow.pl/?lang=PL&rozklad=20180520&linia=1

我要提取的文字:screen

这是我写的代码:

    import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class fetch_bus_stops {
    public static void main(String[] args) {
        try {
            Document doc = Jsoup.connect("http://rozklady.mpk.krakow.pl/?lang=PL&rozklad=20180520&linia=1").userAgent("Mozilla/17.0").get();
            int i=0;
            int k=0;                       
            Elements select = doc.select("body > table > tbody > tr > td > table > tbody > tr > td:nth-child(1) > table > tbody > tr:nth-child(1) > td > table > tbody > tr > td:nth-child(0) > table > tbody > tr:nth-child(1) > td > table > tbody");
            int size = select.size();
            System.out.println("Elements size: " + size);
            for(Element row : select)
            {
                String string = String.format("tr:nth-child(" + Integer.toString(k) + ") > td:nth-child(0) > a > span");
                i++;
                k++;
                System.out.println(i+" "+row.select(string).text());
            }
            }
         catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
    }

}

问题是循环不工作“select”的大小是0,我按元素检查了select元素的大小,并且第4个“table”元素突然大小为“select”= 0,为什么会这样?我该如何解决这个问题?

1 个答案:

答案 0 :(得分:0)

尝试

.main > tbody:nth-child(2) > tr:nth-child(1) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) tr

输出:

0
<tr>
Wzgórza Krzesławickie
1
<tr>
Jarzębiny
2
<tr>
Darwina
3
<tr>
Wiadukty
4
<tr>
Wańkowicza
5
<tr>
Cienista
6
<tr>
Teatr Ludowy
7
<tr>
Rondo Kocmyrzowskie im. Ks. Gorzelanego
8
<tr>
Bieńczycka
9
<tr>
Rondo Czyżyńskie
10
<tr>
Centralna
11
<tr>
Rondo 308. Dywizjonu
12
<tr>
M1 Al. Pokoju
13
<tr>
TAURON Arena Kraków Al. Pokoju NZ
14
<tr>
Plaza
15
<tr>
Dąbie
16
<tr>
Ofiar Dąbia
17
<tr>
Fabryczna
18
<tr>
Francesco Nullo
19
<tr>
Teatr Variété
20
<tr>
Rondo Grzegórzeckie
21
<tr>
Hala Targowa
22
<tr>
Starowiślna
23
<tr>
Poczta Główna
24
<tr>
Plac Wszystkich Świętych
25
<tr>
Filharmonia
26
<tr>
Jubilat
27
<tr>
Komorowskiego
28
<tr>
Salwator
29
<tr>
Salwator

解析该页面将是一个挑战。 然而,恕我直言,最好找到一些地标,然后导航到邻居。

例如

td:containsOwn(przystanki)

会给你所需的“标题”行

从那里你可以导航3个父母和第二行。