所以我把这样的代码从站点中的标记中获取一个值:
try {
URL url = new URL("google.com");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
while (in.readLine() != null) {
inputLine = in.readLine();
}
in.close();
} catch (IOException e) {
e.printStackTrace();
}
所以说我需要它来找到“Pizza”,但只有一些代码弹出,所以我无法访问该部分有一种方法我可以打印出WHOLE HTML(使用BufferReader并且没有像Jsoup这样的额外导入),以及然后检查一下?
答案 0 :(得分:1)
URL url = new URL("http://www.google.com");
URLConnection uc = url.openConnection();
InputStreamReader input = new InputStreamReader(uc.getInputStream());
BufferedReader in = new BufferedReader(input);
String inputLine;
FileWriter outFile = new FileWriter("orhancan");
PrintWriter out = new PrintWriter(outFile);
while ((inputLine = in.readLine()) != null) {
out.println(inputLine);
}
in.close();
out.close();
File fXmlFile = new File("orhancan");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
NodeList prelist = doc.getElementsByTagName("body");
System.out.println(prelist.getLength());
有一种更简单的方法可以做到这一点。我建议使用JSoup。使用JSoup,您可以执行以下操作:json 文档doc = Jsoup.connect(“http://en.wikipedia.org/”)。get(); Elements newsHeadlines = doc.select(“#mp-itn b a”); 或者如果你想要身体:
Elements body = doc.select("body");
或者如果您想要所有链接:
Elements links = doc.select("body a");