我使用Xpath在网站中抓取数据:
response.xpath('//*[@class="attr category hide--mobile"]/span/a/text()').extract()
它给出以下结果,这些结果是网站上帖子的标签。
['Furniture',
'Wardrobes',
'Furniture',
'Wardrobes',
'Furniture',
'Wardrobes',
'Furniture',
'Wardrobes',
'Furniture',
'Wardrobes',
'Furniture',]
问题是前两个值是网站上同一容器的标签。
该网站在单个容器中包含以下数据:
White and Grey Galley Walk-in Closet PLYJ17017-057
White and Grey Galley Walk-in Closet PLYJ17017-057 Specification: Code: PLYJ17017-057 Style: Modern Door: Lacquer Carcase: Melamine White and Grey Galley Walk-in Wardrobe Closet, an important part of the design for the bedroom is the addition of a...
Category: Furniture | Wardrobes
如何在单个容器中捕获两个标签,即家具和衣柜。