我终于让IntelliJ工作了。我正在使用下面的代码。完美的作品。我需要它一遍又一遍地循环,并从电子表格中提取链接以反复查找不同项目上的价格。我有一个电子表格,该电子表格的C列中的一些示例URL从第2行开始。如何让JSOUP使用此电子表格中的URL,然后输出到D列?
public class Scraper {
public static void main(String[] args) throws Exception {
final Document document = Jsoup.connect("examplesite.com").get();
for (Element row : document.select("#price")) {
final String price = row.select("#price").text();
System.out.println(price);
}
}
在此先感谢您的帮助! 埃里克
答案 0 :(得分:0)
您可以使用JExcel库读取和编辑工作表:https://sourceforge.net/projects/jexcelapi/。
使用库下载zip文件时,还有一个非常有用的tutorial.html
。
注释说明:
import java.io.File;
import java.io.IOException;
import jxl.Cell;
import jxl.CellType;
import jxl.Workbook;
import jxl.write.Label;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class StackoverflowQuestion51577491 {
private static final int URL_COLUMN = 2; // Column C
private static final int PRICE_COLUMN = 3; // Column D
public static void main(final String[] args) throws Exception {
// open worksheet with URLs
Workbook originalWorkbook = Workbook.getWorkbook(new File("O:/original.xls"));
// create editable copy
WritableWorkbook workbook = Workbook.createWorkbook(new File("O:/updated.xls"), originalWorkbook);
// close read-only workbook as it's not needed anymore
originalWorkbook.close();
// get first available sheet
WritableSheet sheet = workbook.getSheet(0);
// skip title row 0
int currentRow = 1;
Cell cell;
// iterate each cell from column C until we find an empty one
while (!(cell = sheet.getCell(URL_COLUMN, currentRow)).getType().equals(CellType.EMPTY)) {
// raed cell contents
String url = cell.getContents();
System.out.println("parsing URL: " + url);
// parse and get the price
String price = parseUrlWithJsoupAndGetProductPrice(url);
System.out.println("found price: " + price);
// create new cell with price
Label cellWithPrice = new Label(PRICE_COLUMN, currentRow, price);
sheet.addCell(cellWithPrice);
// go to next row
currentRow++;
}
// save and close file
workbook.write();
workbook.close();
}
private static String parseUrlWithJsoupAndGetProductPrice(String url) throws IOException {
// download page and parse it to Document
Document doc = Jsoup.connect(url).get();
// get the price from html
return doc.select("#priceblock_ourprice").text();
}
}