我写了一个脚本,废弃谷歌学者并获取下载文件的链接,但问题是,它在项目执行时让我进入控制台。我希望下载链接存储在一个对象中,最后下载它 我想将它存储在网站对象中。现在我只是从控制台硬编码链接并粘贴在这里。
URL website = new URL(" http://ilpubs.stanford.edu:8090/421/1/1999-65.pdf");
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream("D:\\paper.html");
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
这是我的整个代码:
public static void main(String[] args) throws IOException {
try {
Document doc = Jsoup
.connect("https://scholar.google.com.pk/scholar?q=Sergey+Brin.+Extracting+patterns+and+relations+from+the+world+wide+web.+In+WebDB+Workshop+at+EDBT+%E2%80%9998%2C+1998.+Available+online+at+%3Chttp%3A%2F%2Fwwwdb.stanford.edu%2F+sergey%2Fextract.ps%3E.&btnG=&hl=en&as_sdt=0%2C5")
.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
.get();
String title = doc.title();
System.out.println("title : " + title);
Elements links = doc.select("div.gs_ggsd").select("a[href]");
//Element = doc.select("div.gs_ggs gs_fl").first();
for (Element link : links) {
System.out.println("\nlink : " + link.attr("href"));
System.out.println("text : " + link.text());
}
这是我想要的控制台中的输出,(我只需要链接)
title : Sergey Brin. Extracting patterns and relations from the world wide web. In WebDB Workshop at EDBT %E2%80%9998%2C 1998. Available online at %3Chttp%3A%2F%2Fwwwdb.stanford.edu%2F sergey%2Fextract.ps%3E. - Google Scholar
link : http://ilpubs.stanford.edu:8090/421/1/1999-65.pdf
text : [PDF] stanford.edu