使用IMPORTXML从在线书店获取书封面链接

时间:2019-10-29 03:26:39

标签: google-sheets google-sheets-importxml

我正试图从此页面获取书籍封面的图像链接: https://www.wook.pt/livro/diario-1927-1941-virginia-woolf/21571877

我已经检查过封面了

<div class="cover" id="productPageLeftSectionTop-image" data-prodid="21571877"> 

    <img sizes="(max-width: 688px) 75vw, 25vw" srcset="https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/260x 265w,https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/320x 325w,https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/350x 355w,https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/502x 500w"src="https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/250x" alt="Wook.pt - Diário 1927-1941" title="Wook.pt - Diário 1927-1941" onclick="" class="img-responsive ">
    </div>

我尝试过:

=IMPORTXML("https://www.wook.pt/livro/diario-1927-1941-virginia-woolf/21571877", "//div[@class='cover']/@src")

但是内容显示为空...我不知道如何提取图像的链接: https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/250x

1 个答案:

答案 0 :(得分:0)

  • 您要从https://img.wook.pt/images/diario-1927-1941-virginia-woolf/MXwyMTU3MTg3N3wxNzQyMTI1NnwxNTQ0NzQ1NjAwMDAw/250x检索https://www.wook.pt/livro/diario-1927-1941-virginia-woolf/21571877的URL。
  • 您想使用Google Spreadsheet的内置功能来实现这一目标。

如果我的理解是正确的,那么修改后的xpath怎么样?请认为这只是几个答案之一。

公式:

=IMPORTXML(A1,"//div[@class='cover']/img/@src")
  • xpath已从//div[@class='cover']/@src修改为//div[@class='cover']/img/@src
  • 在这种情况下,https://www.wook.pt/livro/diario-1927-1941-virginia-woolf/21571877的URL放在单元格“ A1”中。

结果:

enter image description here

参考:

如果我误解了您的问题,而这不是您想要的结果,我深表歉意。