基于JSOUP的应用程序的JXL更改列

时间:2019-02-21 16:23:58

标签: java web-scraping jsoup jxl

当前,该程序将运行一列URL,并将所选数据输出到相邻的单元格。我可以设置它从哪一列开始,但这就是我所能做的。现在,我只将其工作在一个专栏中。我如何指示它说第4列(E列),并在到达第0列(A)时自上而下地工作?然后也许是另一个,然后说J列?

我相信我的问题出在“ while(!(cell = sheet.getCell ...”)行之内,但是我不确定在不更改程序的情况下在此处进行更改。

我的代码如下:

> testif <- function(x) {
+   if (any(is.na(x)))  {
+     paste(na.locf(x), letters, sep = "")
+   }
+ }

for (x in df$b)     {
+     if (any(is.na(x)))  {
+         paste(test$b, na.locf(x), letters, sep = "")
+     }
+ }

1 个答案:

答案 0 :(得分:0)

为简化代码,我假设价格始终位于下一列(+1)。

也可以处理几列而不是使用单个值int URL_COLUMN = 0,我将其替换为要处理的列数组:int[] URL_COLUMNS = { 0, 4, 9 }; // Columns A, E, J

然后,您可以遍历每列{0, 4, 9}并将数据保存到下一列{1, 5, 10}


    private static final int[] URL_COLUMNS = { 0, 4, 9 }; // Columns A, E, J

    public static void main(final String[] args) throws Exception {

        Workbook originalWorkbook = Workbook.getWorkbook(new File("C:/Users/Shadow/Desktop/original.xls"));
        WritableWorkbook workbook = Workbook.createWorkbook(new File("C:/Users/Shadow/Desktop/updated.xls"), originalWorkbook);
        originalWorkbook.close();
        WritableSheet sheet = workbook.getSheet(0);
        Cell cell;

        // loop over every column
        for (int i = 0; i < URL_COLUMNS.length; i++) {
            int currentRow = 1;
            while (!(cell = sheet.getCell(URL_COLUMNS[i], currentRow)).getType().equals(CellType.EMPTY)) {

                String url = cell.getContents();
                System.out.println("Checking URL: " + url);
                if (url.contains("scrapingsite1.com")) {
                    String Price = ScrapingSite1(url);
                    System.out.println("Scraping Site1's Price: " + Price);
                    // save price into the next column
                    Label cellWithPrice = new Label(URL_COLUMNS[i] + 1, currentRow, Price);
                    sheet.addCell(cellWithPrice);
                }
                currentRow++;
            }
        }

        workbook.write();
        workbook.close();
    }