IMPORTDATA无法从XML获取实时数据

时间:2019-12-06 01:55:15

标签: web-scraping google-sheets google-sheets-formula array-formulas google-sheets-importxml

我正在使用Google Sheet的IMPORTDATA函数从从API中提取的XML文件中获取信息,但是我提取到工作表中的信息不是最新的。

如何修改工作表以获取最新数据?

比较工作表:https://docs.google.com/spreadsheets/d/1W0Bt5z-Tky-tNhG_JtfE4FfjTRgQNRu_eQu2qVhQ-_E/edit?usp=sharing(LiveScores工作表)

对于XML:https://www67.myfantasyleague.com/2019/export?TYPE=liveScoring&L=64741&APIKEY=&W=14&DETAILS=1&JSON=0

观察两组数据中的特许经营权id="0015"

工作表指出<franchise id="0005" score="0.00" gameSecondsRemaining="21600" playersYetToPlay="6" playersCurrentlyPlaying="0" isHome="0">

XML的状态为<franchise id="0015" score="11.14" gameSecondsRemaining="20004" playersYetToPlay="4" playersCurrentlyPlaying="2">(此数据适用于我在撰写本文时正在玩的足球比赛,因此上面的示例可能并不准确,但是对于0.00,它的得分不会是0.00例子。

任何帮助都会很棒,谢谢!

2 个答案:

答案 0 :(得分:1)

您是否尝试过使用IMPORTXML? Google Sheets IMPORTXML Page

在IMPORTXML中,您可以仅使用“检查元素”功能拉出xpath。

希望这会有所帮助。让我知道是否可以提供进一步的帮助。

编辑:导入数据时的更改说明

  1. 在工具栏中,转到脚本编辑器
  2. 现在在脚本中,粘贴下面列出的代码

/**
 * Go through all sheets in a spreadsheet, identify and remove all spreadsheet
 * import functions, then replace them a while later. This causes a "refresh"
 * of the "import" functions. For periodic refresh of these formulas, set this
 * function up as a time-based trigger.
 *
 * Caution: Formula changes made to the spreadsheet by other scripts or users
 * during the refresh period COULD BE OVERWRITTEN.
 *
 * From: https://stackoverflow.com/a/33875957/1677912
 */
function RefreshImports() {
  var lock = LockService.getScriptLock();
  if (!lock.tryLock(5000)) return;             // Wait up to 5s for previous refresh to end.
  // At this point, we are holding the lock.

  var id = "YOUR-SHEET-ID";
  var ss = SpreadsheetApp.openById(id);
  var sheets = ss.getSheets();

  for (var sheetNum=0; sheetNum<sheets.length; sheetNum++) {
    var sheet = sheets[sheetNum];
    var dataRange = sheet.getDataRange();
    var formulas = dataRange.getFormulas();
    var tempFormulas = [];
    for (var row=0; row<formulas.length; row++) {
      for (col=0; col<formulas[0].length; col++) {
        // Blank all formulas containing any "import" function
        // See https://regex101.com/r/bE7fJ6/2
        var re = /.*[^a-z0-9]import(?:xml|data|feed|html|range)\(.*/gi;
        if (formulas[row][col].search(re) !== -1 ) {
          tempFormulas.push({row:row+1,
                             col:col+1,
                             formula:formulas[row][col]});
          sheet.getRange(row+1, col+1).setFormula("");
        }
      }
    }

    // After a pause, replace the import functions
    Utilities.sleep(5000);
    for (var i=0; i<tempFormulas.length; i++) {
      var cell = tempFormulas[i];
      sheet.getRange( cell.row, cell.col ).setFormula(cell.formula)
    }

    // Done refresh; release the lock.
    lock.releaseLock();
  }
}

此代码段来自Periodically refresh IMPORTXML() spreadsheet function

  1. 最后,绝对是最少,请替换“ YOUR-SHEET-ID”

注意:我尚未亲自测试此代码,因此我不能担保。我建议先进行复制并在那里进行测试。

希望这可以解决您的数据无法按需要多次导入的问题。如果要手动获取“新”数据,则只需删除/剪切导入功能并将其粘贴回。

答案 1 :(得分:0)

尝试 A2

=ARRAYFORMULA(IFNA(VLOOKUP(C2:C, PlayerList!A:F, {2, 6}, 0)))

C2

=ARRAYFORMULA(QUERY(REGEXEXTRACT(QUERY(IMPORTDATA(
 "https://www67.myfantasyleague.com/2019/export?TYPE=liveScoring&L=64741&APIKEY=&W=14&DETAILS=1&JSON=0?273"), 
 "where Col1 contains 'player id'", 0), 
 "(player id=""(\d+)).+?(score=""(\d+.\d+))"), 
 "select Col2,Col4"))

0

spreadsheet demo