我想提取网站中所有链接的“链接文字”和“链接地址”。我想要的是立即提取这些数据并将其保存在对象中。
# here is what i tried in python:
urls = response.xpath('//a[@class="link-on-click grayhover"]/@href|/span/text()').extract()
// here is what the html code looks like:
<div>
<a class="link-on-click grayhover"
href="/brows/cars">
<span>cars list</span>
</a>
</div>
我希望结果是这样的:
{url : "/brows/cars", text:'cars list'}
答案 0 :(得分:2)
尝试从以下字典列表中获取值:
my_list = []
links = response.xpath('//a[@class="link-on-click grayhover"]')
for link in links:
my_list.append({'url': link.xpath('./@href').extract_first(), 'text': link.xpath('./span/text()').extract_first()})
答案 1 :(得分:1)
让我们看看您是否可以到达那里:
source = """
<div>
<a class="link-on-click grayhover"
href="/brows/cars">
<span>cars list</span>
</a>
“”“
from lxml import etree
doc = etree.fromstring(source)
car_dict = {}
for ref in doc.xpath('//a'):
url = ref.get("href")
for car in ref.xpath('//span'):
car_text = car.text
car_dict.update({'url':url,'text':car_text})
print(car_dict)
输出:
{'url': '/brows/cars', 'text': 'cars list'}
可能有多种方法可以简化此过程(理解等),但是暂时应该可以。