Question

我想抓取页面文件：

DailyData.remove({date: getCurrentDate(), owner: currUser})
    .then(function(){
        return DailyData.create({date: getCurrentDate(), owner: currUser});
    })
    .then(function(createdData){
        createdDataGlobal = createdData;
        return UpdatedInnerData.remove({date: getCurrentDate(), owner: currUser});
    })
    .then(function(createdData){
        insertNewInnerData();
    })
    .catch(handleError);

我想获取数据<body class="body_class" style="background:#444;"> <div class="data" id="id"> <div id="images" style="cursor: auto;"> <img id="page-1" src="image1.jpg" data-index="1" style="" data-bd-imgshare binded="1"> <p class="img_info">(1/14)</p> </div> </div> </body>。

我尝试了代码

image1.jpg

失败。如何获取数据？

谢谢。

Answer 1

您要查找文本“ image1.jpg”作为数据吗？如果是这样，则只需使用此xpath //div[@id="images"]//@src。

，如果要使用src中的地址下载图像，则可以使用。

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", 
"00000001.jpg")

如何使用lxml获取数据

1 个答案: