使用Google App脚本从网页中的表格中提取值

时间:2017-02-01 15:35:53

标签: google-apps-script

我尝试从网页中提取特定值,以便将其拉入Google表格电子表格。问题在于页面的结构不会使值容易拉动。

鉴于下面的HTML,任何人都可以提出一种方法来拉动#4,586和#34;来自TD元素之后包含" Prop Taxes"?页面上有很多TD,其类别为" d97m50"。还有很多表格包含" d97m2"。

我尝试了以下但无法使其中任何一个工作。对于第一个,我无法确定在页面上迭代TD的方法,在包含" Prop Taxes"之后找到TD。并从中提取文本。第二个失败了,因为我无法确定一个可以做同样事情的正则表达式。



<TABLE class="d97m2" cellSpacing=0 cellPadding=0 sizset="false" sizcache06358115873960983="276 82 150">
<!-- A bunch of other rows -->
<TR>
<TD class="d97m40"><span class="label">Prop Taxes:</SPAN></TD>
<TD class="d97m50" colSpan=2><SPAN class="wrapped-field">$4,586</span></TD>
<TD class="d97m43"><span class="label d97m29">Garbage:</SPAN></TD>
<TD class="d97m26"><SPAN class="wrapped-field">$0</span></TD>
<TD class="d97m44"><span class="label">Parking Inc:</SPAN></TD>
<TD class="d97m45"><SPAN class="wrapped-field">$0</span></TD>
<TD class="d97m46"><span class="label">TOE:</SPAN></TD>
<TD class="d97m47"><SPAN class="wrapped-field">$10,248</span></TD></TR>
<TR>
<!-- a bunch more rows -->
</TABLE>
&#13;
&#13;
&#13;

2 个答案:

答案 0 :(得分:0)

拉表的一种相当简单的方法是使用表格中的importhtml函数,例如:

=importhtml("http://www.tradingeconomics.com/zambia/rating","table",1)

答案 1 :(得分:0)

如果您可以获取希望处理为Javascript String对象的HTML,则可以使用RegEx来识别您所追踪的特定字符串。

例如,给出测试文本:

<TABLE class="d97m2" cellSpacing=0 cellPadding=0 sizset="false"      sizcache06358115873960983="276 82 150">
<!-- A bunch of other rows -->
<TR>
<TD class="d97m40"><span class="label">Prop Taxes:</SPAN></TD>
<TD class="d97m50" colSpan=2><SPAN class="wrapped-field">$4,586</span></TD>
<TD class="d97m43"><span class="label d97m29">Garbage:</SPAN></TD>
<TD class="d97m26"><SPAN class="wrapped-field">$0</span></TD>
<TD class="d97m44"><span class="label">Parking Inc:</SPAN></TD>
<TD class="d97m45"><SPAN class="wrapped-field">$0</span></TD>
<TD class="d97m46"><span class="label">TOE:</SPAN></TD>
<TD class="d97m47"><SPAN class="wrapped-field">$10,248</span></TD></TR>
<TR>
<!-- a bunch more rows -->
</TABLE>

以下正则表达式:

/.*?Prop\sTaxes(.|\s)*?d97m50.*?\$(.*?)<\/span/mg

将在其第二场比赛中产生值“4,586”,然后您可以按照自己的意愿处理。

这是一个示例答案,展示了如何获得多个匹配并处理它们。

Javascript Regular Expression multiple match

此代码适用于我:

function regExTest() {
  var s = '<TABLE class="d97m2" cellSpacing=0 cellPadding=0 sizset="false"      sizcache06358115873960983="276 82 150">' +
    '<!-- A bunch of other rows -->' +
    '<TR>' +
    '<TD class="d97m40"><span class="label">Prop Taxes:</SPAN></TD>' +
    '<TD class="d97m50" colSpan=2><SPAN class="wrapped-field">$1,986</span></TD>' +
    '<TD class="d97m43"><span class="label d97m29">Garbage:</SPAN></TD>' +
    '<TD class="d97m26"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m44"><span class="label">Parking Inc:</SPAN></TD>' +
    '<TD class="d97m45"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m46"><span class="label">TOE:</SPAN></TD>' +
    '<TD class="d97m47"><SPAN class="wrapped-field">$10,248</span></TD></TR>' +
    '<TR>' +
    '<TR>' +
    '<TD class="d97m40"><span class="label">Prop Taxes:</SPAN></TD>' +
    '<TD class="d97m50" colSpan=2><SPAN class="wrapped-field">$4,586</span></TD>' +
    '<TD class="d97m43"><span class="label d97m29">Garbage:</SPAN></TD>' +
    '<TD class="d97m26"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m44"><span class="label">Parking Inc:</SPAN></TD>' +
    '<TD class="d97m45"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m46"><span class="label">TOE:</SPAN></TD>' +
    '<TD class="d97m47"><SPAN class="wrapped-field">$10,248</span></TD></TR>' +
    '<TR>' +
    '<TR>' +
    '<TD class="d97m40"><span class="label">Prop Taxes:</SPAN></TD>' +
    '<TD class="d97m50" colSpan=2><SPAN class="wrapped-field">$2,514</span></TD>' +
    '<TD class="d97m43"><span class="label d97m29">Garbage:</SPAN></TD>' +
    '<TD class="d97m26"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m44"><span class="label">Parking Inc:</SPAN></TD>' +
    '<TD class="d97m45"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m46"><span class="label">TOE:</SPAN></TD>' +
    '<TD class="d97m47"><SPAN class="wrapped-field">$10,248</span></TD></TR>' +
    '<TR>' +
    '<TR>' +
    '<TD class="d97m40"><span class="label">Prop Taxes:</SPAN></TD>' +
    '<TD class="d97m50" colSpan=2><SPAN class="wrapped-field">$3,312</span></TD>' +
    '<TD class="d97m43"><span class="label d97m29">Garbage:</SPAN></TD>' +
    '<TD class="d97m26"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m44"><span class="label">Parking Inc:</SPAN></TD>' +
    '<TD class="d97m45"><SPAN class="wrapped-field">$0</span></TD>' +
    '<TD class="d97m46"><span class="label">TOE:</SPAN></TD>' +
    '<TD class="d97m47"><SPAN class="wrapped-field">$10,248</span></TD></TR>' +
    '<TR>' +
    '<!-- a bunch more rows -->' +
    '</TABLE>';

  var qualityRegex = /.*?Prop\sTaxes(.|\s)*?d97m50.*?\$(.*?)<\/span/mg,
      matches = [];

  var match = qualityRegex.exec(s);
  while (match != null) {
      matches.push(match[2]);
      match = qualityRegex.exec(s);
  }

  /* Matches now contains the numbers you require */
}