这是我正在处理的网站:http://rozklady.mpk.krakow.pl/?lang=PL&rozklad=20180520&linia=1
我要提取的文字:screen
这是我写的代码:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class fetch_bus_stops {
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("http://rozklady.mpk.krakow.pl/?lang=PL&rozklad=20180520&linia=1").userAgent("Mozilla/17.0").get();
int i=0;
int k=0;
Elements select = doc.select("body > table > tbody > tr > td > table > tbody > tr > td:nth-child(1) > table > tbody > tr:nth-child(1) > td > table > tbody > tr > td:nth-child(0) > table > tbody > tr:nth-child(1) > td > table > tbody");
int size = select.size();
System.out.println("Elements size: " + size);
for(Element row : select)
{
String string = String.format("tr:nth-child(" + Integer.toString(k) + ") > td:nth-child(0) > a > span");
i++;
k++;
System.out.println(i+" "+row.select(string).text());
}
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
问题是循环不工作“select”的大小是0,我按元素检查了select元素的大小,并且第4个“table”元素突然大小为“select”= 0,为什么会这样?我该如何解决这个问题?
答案 0 :(得分:0)
尝试
.main > tbody:nth-child(2) > tr:nth-child(1) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) tr
输出:
0
<tr>
Wzgórza Krzesławickie
1
<tr>
Jarzębiny
2
<tr>
Darwina
3
<tr>
Wiadukty
4
<tr>
Wańkowicza
5
<tr>
Cienista
6
<tr>
Teatr Ludowy
7
<tr>
Rondo Kocmyrzowskie im. Ks. Gorzelanego
8
<tr>
Bieńczycka
9
<tr>
Rondo Czyżyńskie
10
<tr>
Centralna
11
<tr>
Rondo 308. Dywizjonu
12
<tr>
M1 Al. Pokoju
13
<tr>
TAURON Arena Kraków Al. Pokoju NZ
14
<tr>
Plaza
15
<tr>
Dąbie
16
<tr>
Ofiar Dąbia
17
<tr>
Fabryczna
18
<tr>
Francesco Nullo
19
<tr>
Teatr Variété
20
<tr>
Rondo Grzegórzeckie
21
<tr>
Hala Targowa
22
<tr>
Starowiślna
23
<tr>
Poczta Główna
24
<tr>
Plac Wszystkich Świętych
25
<tr>
Filharmonia
26
<tr>
Jubilat
27
<tr>
Komorowskiego
28
<tr>
Salwator
29
<tr>
Salwator
解析该页面将是一个挑战。 然而,恕我直言,最好找到一些地标,然后导航到邻居。
例如
td:containsOwn(przystanki)
会给你所需的“标题”行
从那里你可以导航3个父母和第二行。