Office.js:将大量行写入Excel

时间:2018-03-28 23:55:51

标签: angular office-js excel-2016

我有大约100列宽的行,但我只想写大约8000行。当我写出这些行中的3000行(分批为500行)时,它始终以每500行大约2-3秒的速度写入。

然而,当我尝试写出包含8000行类似数据(更多列)的更大数据集时,它对前3000行(每500行约3-4秒)执行得很好,但是从2500-3000行开始,性能变得越来越慢,而且excel就会爬行。例如:

write rows address:  Sheet1!A3:DC502
batch write time:    3.0766400244386167  seconds
write rows address:  Sheet1!A503:DC1002
batch write time:    3.3348399202363796  seconds
write rows address:  Sheet1!A1003:DC1502
batch write time:    3.7307800745354034  seconds
write rows address:  Sheet1!A1503:DC2002
batch write time:    4.149179874582915  seconds
write rows address:  Sheet1!A2003:DC2502
batch write time:    3.8166401331085944  seconds
write rows address:  Sheet1!A2503:DC3002
batch write time:    7.215600102149649  seconds
write rows address:  Sheet1!A3003:DC3502
batch write time:    31.93173993128445  seconds
write rows address:  Sheet1!A3503:DC4002
batch write time:    95.68281983804563  seconds
write rows address:  Sheet1!A4003:DC4502
batch write time:    148.84947986377625  seconds
write rows address:  Sheet1!A4503:DC5002
batch write time:    203.41412001861877  seconds
write rows address:  Sheet1!A5003:DC5502
batch write time:    270.2974798251381  seconds
write rows address:  Sheet1!A5503:DC6002
...

前30个左右的列包含公式,单元格中包含条件格式。其余的只是将数据写入纯白色细胞。我只是使用range.values将数据交给excel,这就是花了这么长时间。 我可以做些什么来获得稳定的表现?

这是我的代码:

async writeRows(data, formulas, sheetName, startCol, startRow) {
  return await Excel.run(async (ctx) => {
    let sheet = ctx.workbook.worksheets.getItem(sheetName);
    let endRow = startRow;
    let startRowOffset = startRow;
    let batchSize = 500;
    for (let i = 0; i < data.length; i = i + batchSize) {
      let t0 = performance.now();
      let min = Math.min(batchSize, data.length - i);
      let endCol = intToColumn(data[0].length);
      startRow = startRowOffset + i;
      endRow = startRow + min - 1;
      let address = sheetName + "!" + startCol + startRow + ":" + endCol + endRow;
      console.log("write rows address: ", address);
      let range = sheet.getRange(address);
      ctx.application.suspendApiCalculationUntilNextSync();
      range.values = data.slice(i, i + min)
      range.formulas = formulas.slice(i, i + min);
      await ctx.sync();
      let t1 = performance.now();
      console.log("batch write time: ", (t1 - t0) / 1000, ' seconds');
    }
    return endRow;
  });
}

如果你认为它只是重型公式,那么就是运行相同行而不向range.formulas分配任何内容的时间:

batch write time:  2.072960040280297  seconds
batch write time:  1.893160016976646  seconds
batch write time:  2.239300093637197  seconds
batch write time:  2.4051598865728154  seconds
batch write time:  2.4535400113378855  seconds
batch write time:  4.228719875053808  seconds
batch write time:  21.932359953223656  seconds
batch write time:  65.58508005044697  seconds
batch write time:  99.76420028338683  seconds
batch write time:  133.58046007197566  seconds
batch write time:  181.46535997193905  seconds
...

以下是任务管理器的截图:

enter image description here

有什么想法吗?

4 个答案:

答案 0 :(得分:1)

你在循环中有一个ctx.sync。这可能是一个性能杀手。尝试重构该方法,以便通过单个同步写入所有内容。在我对这个问题的回答中看到模式可能会有所帮助:Document not in sync after replace text

答案 1 :(得分:1)

我使用类似的批处理方法将大约35,000行加载到officejs中,但批次之间的性能降低很多。 35批1000条线总共需要7秒钟: enter image description here

这是我的代码:

getWorksheetDataInChunks() {
    return ready.then(() => {
        return Excel.run(async (context) => {
            const sheet = context.workbook.worksheets.getActiveWorksheet();
            const dataRange = sheet
                .getUsedRange()
                .load('columnIndex, rowIndex, columnCount, rowCount, address');
            await context.sync();
            const rowsTotal = dataRange.rowCount;
            const batchSize = config.batchSize;
            const data = [];
            for (let i = 0; i < rowsTotal; i += batchSize) {
                const chunk = `chunk${i / batchSize}`;
                const chunkStart = `${chunk}-start`;
                const chunkEnd = `${chunk}-end`;
                performance.mark(chunkStart);

                const rowsRemaining = rowsTotal - i;
                const rowOffset = rowsRemaining >= batchSize ? batchSize : rowsRemaining;
                const currentRange = sheet
                    .getCell(dataRange.rowIndex + i, dataRange.columnIndex)
                    .getResizedRange(rowOffset - 1, dataRange.columnCount - 1)
                    .load('values, columnIndex, rowIndex, columnCount, rowCount, address');
                await context.sync();
                data.push(...currentRange.values);

                performance.mark(chunkEnd);
                performance.measure(chunk, chunkStart, chunkEnd);
            }
            return data;
        }).catch(buildErrorHandler('getWorksheetDataInChunks'));
    });
},

你能测试读取数据的速度和写入速度一样慢吗?

在循环中没有context.sync()的情况下你不会成功,因为超出极限excel的方式将在一个同步中处理。这是您首先进行批处理的唯一原因。尝试使用Excel.run()为每个循环迭代创建一个新的上下文,也许你可以&#34;清理&#34;在上一批之后。

答案 2 :(得分:1)

最大问题:

未设置列格式(range.numberFormat)。在将数据分配给range.values之前,先设置这些,这样做会有所不同。

//Initial pass over every column to set types
if (type === "Date") {
    range.numberFormat = <any>'m/d/yyyy';
} else if (type === "Double") {
    range.numberFormat = <any>"#,##0.00";
} else {
    range.numberFormat = <any>"#";
}

最后时间:

按照你期望的那样,以1000行的批次写出并保持良好的一致时间:

batch write time:  3.0076802402327596  seconds
batch write time:  3.0477398637461594  seconds
batch write time:  2.9507200432747487  seconds
batch write time:  3.0690198313384025  seconds
batch write time:  2.988500015243975  seconds
batch write time:  3.048739928042458  seconds
batch write time:  3.0736401102757082  seconds
batch write time:  3.097120038203488  seconds
batch write time:  2.068400111446943  seconds

其他有用的更改:

最初,我在二维数组中逐个单元构建我的公式。例如:

//explicity writing out each formula
range.formulas = [
                   [=A1+B1, =B2+C2],
                   [=A2+B2, =B2+C2],
                   ...
                   [=A100+B100, =B100+C100]
                 ];

正确的方法是:

//writing out formula for the first cell, and let excel expand it to the range.
let range = sheet.getRange('A1:A100');
range.formulas = '=A1+B1' as any 
let range2 = sheet.getRange('B1:B100');
range2.formulas = '=B1+C1' as any 

来源:感谢@Slai分享此链接:https://github.com/OfficeDev/office-js-docs-pr/blob/master/docs/excel/performance.md

其他

由于我正在编写关于前30个左右列的公式,最初,我在前30列中写出了从A列到DC列的整行,其中为null。我认为这不必要地写了30列空值,这可能会减慢速度。最后,我决定省略那些空列,只写出带有数据的列。

我也玩过ctx.sync(),但是在Office 2016的桌面版上,我个人并没有注意到这种或那种方式有太多不同(在循环内部,在外面,嵌套等)。但是,如果我在办公室里做任何事情,比如@Rick Kirkham在他的链接中提到的话,我会更加关注。

通过阅读本文:https://github.com/OfficeDev/office-js/issues/12#issuecomment-374741210,我还将我的桌面版Excel更新为晚于build 9021。

答案 3 :(得分:0)

我不会在您的代码中了解其他人,但仅限excel。据我所知,你试图在Excel中推送数据。所以,我的建议 - 将整个csv文件组合在一起并将其导出到Excel或通过Excel将其作为批量导入。