我已经阅读了很多关于解析等的帖子。我看到的大多数回复都建议这个人使用图书馆或其他东西。我现在的问题是创建一个算法来获取我想要的确切信息。我的目的是从Weather网站获取2个状态以便学校关闭。我开始使用Jsoup作为推荐的人,但我需要帮助。
网页:Click here
图片:Click here
网页来源示例:click here
我可能想知道如何在网页中获取特定的文本行,因为我已经知道了我正在寻找的学校的名称,但是2行是我所需要的状态。如果每所学校都有一定的地位但是他们都是封闭的或两小时的延迟,那么我就不会只是搜索它。我想要一些关于如何使用这个的想法或答案。我打算这样做2次因为我想要查找2所学校。我已经有了可以用来查找它们的名字我只需要状态。
以下是我想要做的一个例子。 (sudo代码)
Document doc = connect(to url);
Element schoolName1 = doc.lookForText(htmlLineHere/schoolname);
String status1 = schoolName.getNext().text();//suppose this gets the line right after which should be my status and then cleans off the Html.
这就是我现在所拥有的
public static SchoolClosing lookupDebug() throws IOException {
final ArrayList<String> Status = new ArrayList<String>();
try {
//connects to my wanted website
Document doc = Jsoup.connect("http://www.10tv.com/content/sections/weather/closings.html").get();
//selects/fetches the line of code I want
Element schoolName = doc.html("<td valign="+"top"+">Athens City Schools</td>");
//an array of Strings where I am going to add the text I need when I get it
final ArrayList<String> temp = new ArrayList<String>();
//checking if its fetching the text
System.out.println(schoolName.text());
//add the text to the array
temp.add(schoolName.text());
for (int i = 0; i <= 1; i++) {
final String[] tempStatus = temp.get(i).split(" ");
Status.add(tempStatus[0]);
}
} catch (final IOException e) {
throw new IOException("There was a problem loading School Closing Status");
}
return new SchoolClosing(Status);
}
答案 0 :(得分:2)
Document doc = Jsoup.connect(
"http://www.10tv.com/content/sections/weather/closings.html")
.get();
for (Element tr : doc.select("#closings tr")) {
Element tds = tr.select("td").first();
if (tds != null) {
String county = tr.select("td:eq(0)").text();
String schoolName = tr.select("td:eq(1)").text();
String status = tr.select("td:eq(2)").text();
System.out.println(String.format(
"county: %s, schoolName: %s, status: %s", county,
schoolName, status));
}
}
输出:
county: Athens, schoolName: Beacon School, status: Two-hour Delay
county: Franklin, schoolName: City of Grandview Heights, status: Snow Emergency through 8pm Thursday
county: Franklin, schoolName: Electrical Trades Center, status: All Evening Activities Cancelled
county: Franklin, schoolName: Hilock Fellowship Church, status: PM Services Cancelled
county: Franklin, schoolName: International Christian Center, status: All Evening Activities Cancelled
county: Franklin, schoolName: Maranatha Baptist Church, status: PM Services Cancelled
county: Franklin, schoolName: Masters Commission New Covenant Church, status: Bible Study Cancelled
county: Franklin, schoolName: New Life Christian Fellowship, status: All Activities Cancelled
county: Franklin, schoolName: The Epilepsy Foundation of Central Ohio, status: All Evening Activities Cancelled
county: Franklin, schoolName: Washington Ave United Methodist Church, status: All Evening Activities Cancelled
或循环:
for (Element tr : doc.select("#closings tr")) {
System.out.println("----------------------");
for (Element td : tr.select("td")) {
System.out.println(td.text());
}
}
给出:
----------------------
Athens
Beacon School
Two-hour Delay
----------------------
Franklin
City of Grandview Heights
Snow Emergency through 8pm Thursday
----------------------
Franklin
Electrical Trades Center
All Evening Activities Cancelled
...