如何获取与id对应的值

时间:2017-09-19 09:03:12

标签: python html css scrapy

我想得到“id”,它是锚标记中的对应值。

<li id="1" class="list">
    <a class="tim">This is Link1</a>
<li id="2" class="list">
    <a class="tim">This is Link2</a>
<li id="3" class="list">
    <a class="tim">This is Link3</a>

我尝试使用以下代码:

from scrapy.http import HtmlResponse
response = HtmlResponse(url="some url", body=htmltext, encoding='utf8')

for x in response.css('li::attr(id)').extract():
    item = {}
    item['id'] = x
    item['value'] = x.css('a.tim::text').extract()

但是它为最后一行提供了AttributeError: 'unicode' object has no attribute 'css'

1 个答案:

答案 0 :(得分:1)

extract()提取属性的值,因此您有一个属性值列表:

>>> response.css('li::attr(id)').extract()
['1', '2', '3']

不要提取然后循环,您需要选择li元素(而不是属性),然后循环遍历Selector个实例:

for x in response.css('li[id]'):  # li elements that have an id attribute
    item = {
        'id': x.css('::attr(id)').extract_first(),
        'value': x.css('a.tim::text').extract_first(),
    }

这会生成一个包含所需idvalue属性的字典:

>>> for x in response.css('li[id]'):  # li elements that have an id attribute
...     item = {
...         'id': x.css('::attr(id)').extract_first(),
...         'value': x.css('a.tim::text').extract_first(),
...     }
...     print(item)
...
{'id': '1', 'value': 'This is Link1'}
{'id': '2', 'value': 'This is Link2'}
{'id': '3', 'value': 'This is Link3'}