Question

我是scrapy的新手，我试图从嵌套表格中的图像的title属性中获取文本值。以下是表格的示例

<html>
     <body>
      <div id=yw1>
      <table id="x">
        <thead></thead>
         <tbody>
          <tr>
           <td>
             <table id="y">
               <thead></thead>
               <tbody>
                <tr>
                 <td><img src=".." title="Sample"></td>
                 <td></td>
                </tr>
               </tbody>
             </table>
           </td>
           <td></td>
          </tr>
         </tbody>
      </table>
      </div>
     </body>
</html>

我使用以下scrapy代码从title属性中获取文本。

def parse(self, response):
    transfers = Selector(response).xpath('//*[@id="yw1"]/table/tbody/tr')

    for transfer in transfers:
        item = TransfermarktItem()
        item['naam'] = transfer.xpath('td[1]/table/tbody/tr[1]/td[1]/img/@title/text()').extract()
        item['positie'] = transfer.xpath('td[1]/table/tbody/tr[1]/td[2]/a/text()').extract()
        item['leeftijd'] = transfer.xpath('td[2]/text()').extract()
        yield item

由于某种原因，不提取title属性的文本值。我做错了什么？

干杯！

Answer 1

似乎你可以使用

        item['naam'] = transfer.xpath(
           'td[1]/table/tbody/tr[1]/td[1]/img/@title'
        )

这将返回一个列表。

text()对于获取标记属性值没有用。 extract()我认为这里也可以省略。

编辑：如果上述仍然不起作用，则会有更多可能性tbody问题，即http://doc.scrapy.org/en/latest/topics/firefox.html。你可以尝试这样： td[1]/table//tr[1]/td[1]/img/@title

如果这没有帮助，那么根据我们在这里得到的数据，我认为我没有想法：）

scrapy从嵌套表中的图像标题属性中获取文本

1 个答案: