我使用JSoup解析来自此网站的数据:
http://www.skore.com/en/soccer/england/premier-league/results/all/
我得到球队和结果的名字,我还需要获得得分手的名字(这是结果)。
我正在尝试但是遇到麻烦,因为它不在HTML中。
有可能吗?如果是的话怎么样?
答案 0 :(得分:3)
在AJAX请求之后(当您点击分数链接时发生)获取记分员信息。您必须提出此类请求并解析结果。
对于instnace,参加第一场比赛(曼联1x2曼城),其标签是:
<a data-y="r1-1229442" data-v="england-premierleague-manchesterunited-manchestercity-13april2013" style="cursor: pointer;">1 - 2</a>
选择data-y
,删除前导r
并向以下网址发送获取请求:
http://www.skore.com/en/scores/soccer/id/<DATA-Y_HERE>?fmt=html
例如:http://www.skore.com/en/scores/soccer/id/1-1229442?fmt=html。然后解析结果。
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ParseScore {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://www.skore.com/en/soccer/england/premier-league/results/all/").get();
System.out.println("title: " + doc.title());
Elements dls = doc.select("dl");
for (Element link : dls) {
String id = link.attr("id");
/* check if then it is a game <dl> */
if (id != null && id.length() > 3 && "rid".equals(id.substring(0, 3))) {
System.out.println("Game: " + link.text());
String idNoRID = id.replace("rid", "");
// String idNoRID = "1-1229442";
String scoreURL = "http://www.skore.com/en/scores/soccer/id/" + idNoRID + "?fmt=html";
Document docScore = Jsoup.connect(scoreURL).get();
Elements trs = docScore.select("tr");
for (Element tr : trs) {
Elements spanGoal = tr.select("span.goal");
/* only enter if there is a goal */
if (spanGoal.size() > 0) {
Elements score = tr.select("td.score");
String playerName = spanGoal.get(0).text();
String currentScore = score.get(0).text();
System.out.println("\t\tGOAL: " + currentScore + ": " + playerName);
}
Elements spanGoalPenalty = tr.select("span.goalpenalty");
/* only enter if there is a goal */
if (spanGoalPenalty.size() > 0) {
Elements score = tr.select("td.score");
String playerName = spanGoalPenalty.get(0).text();
String currentScore = score.get(0).text();
System.out.println("\t\tGOAL: " + currentScore + ": " + playerName + " (penalty)");
}
Elements spanGoalOwn = tr.select("span.goalown");
/* only enter if there is a goal */
if (spanGoalOwn.size() > 0) {
Elements score = tr.select("td.score");
String playerName = spanGoalOwn.get(0).text();
String currentScore = score.get(0).text();
System.out.println("\t\tGOAL: " + currentScore + ": " + playerName + " (own goal)");
}
}
}
}
}
}
<强>输出:强>
title: Skore : Premier League, England - Soccer Results (All)
Game: F T Arsenal 3 - 1 Norwich
GOAL: 0 - 1: Michael Turner
GOAL: 1 - 1: Mikel Arteta (penalty)
GOAL: 2 - 1: Sébastien Bassong (own goal)
GOAL: 3 - 1: Lukas Podolski
Game: F T Aston Villa 1 - 1 Fulham
GOAL: 1 - 0: Charles N´Zogbia
GOAL: 1 - 1: Fabian Delph (own goal)
Game: F T Everton 2 - 0 Queens Park Rangers
GOAL: 1 - 0: Darron Gibson
GOAL: 2 - 0: Victor Anichebe
Game: F T Reading 0 - 0 Liverpool
Game: F T Southampton 1 - 1 West Ham
GOAL: 1 - 0: Gaston Ramirez
GOAL: 1 - 1: Andrew Carroll
Game: F T Manchester United 1 - 2 Manchester City
GOAL: 0 - 1: James Milner
...
使用了JSoup 1.7.1。如果使用maven,请将其添加到pom.xml
:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.7.1</version>
</dependency>