我需要获取所有元素的ID和href(如彩色框中的图片所示)。我不知道如何确切地说道路并提取所需的信息。我怎么能这样做?
答案 0 :(得分:0)
按ID和标签选择,直到找到相关标签,然后按属性获取。请查看下面的代码段:
Document doc = Jsoup.parse("html_file");
Element loginform = doc.getElementById("search_result_container");
Elements inputElements = loginform.getElementsByTag("div");
Element secondDiv = inputElements.get(1);
Elements hyperLinks = secondDiv.getElementsByTag("a");
for (Element alink : hyperLinks) {
String href = alink.attr("href");
String id = alink.attr("id");
}
答案 1 :(得分:0)
好的,我做到了。有用!!感谢SUNNYben,你给了我正确的输入!!!
这是我的解决方案代码:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Steam_GameID_Links
{
public static void main(String[] args)
{
Steam_GameID_Links wc = new Steam_GameID_Links();
try
{
String url = "http://store.steampowered.com/search/?sort_by=_ASC&category1=998&page=1";
Document document = Jsoup.connect(url).get();
// nur die Spielnamen
Elements howMuchPages = document.select(".search_pagination_right");
String[] stuff = howMuchPages.text().split(" ");
String tmp = stuff[4].replace(" ", "").replace(".", "");
StringBuilder sb = new StringBuilder();
for(int i = 0; i < tmp.length(); i++)
{
if(Character.isDigit(tmp.charAt(i)))
{
sb.append(tmp.charAt(i));
}
}
String last = sb.toString().trim();;
int lastPages = Integer.parseInt(last);
int counter = 0;
for(int i = 1; i < lastPages + 1; i++)
{
url = "http://store.steampowered.com/search/?sort_by=_ASC&category1=998&page=" + i;
document = Jsoup.connect(url).get();
// waehlt zunaechst den ElternKnoten: <div id="search_result_container">
Element parentNode = document.getElementById("search_result_container");
Elements childNodes = parentNode.getElementsByAttribute("data-ds-appid");
for(Element alink : childNodes)
{
String href = alink.attr("href");
String id = alink.attr("data-ds-appid");
String name = alink.getElementsByClass("title").text();
System.out.println("Spiel: " + name + ", ID: " + id + ", SpieleLink: " + href);
// wc.writeSpielNameIDLink("Spiel: " + name + ", ID: " + id + ", SpieleLink: " + href + "\n");
}
}
}
catch(IOException e)
{
e.printStackTrace();
}
}