让我说我有以下脚本:
# -*- coding: utf-8 -*-
import scrapy
class StrongSpider(scrapy.Spider):
name = 'Strong'
allowed_domains = ['https://www.strongflex.de/en/4-acura-integra-93-01/']
start_urls = ['https://www.strongflex.de/en/4-acura-integra-93-01/']
def parse(self,response):
product_container = response.css("div.product-container")
prodname = product_container.css("a.product-name::text").extract_first().strip()
price = product_container.css("span.price::text").extract_first().strip()
description = product_container.css("p.product-desc::text").extract_first().strip()
img = product_container.css("img.replace-2x.img-responsive::attr(src)").extract_first()
for item in zip(prodname,price,description,img):
scraped_info = {
'prodname' : prodname[0],
'price' : price[1],
'description' : description[2],
'img' : img[3],
}
yield scraped_info
我想在循环系统中说是否不存在item [1],然后打印空白,实际上我不知道该怎么做...如果没有所有产品都没有价格,我的脚本就会跳过< / p>
答案 0 :(得分:0)
如果您要发送空值而不是跳过的项目,则可以这样做
for item in zip(prodname,price,description,img):
scraped_info = {
'prodname' : prodname[0],
'price' : price[1] if len(price)>=2 else '',
'description' : description[2],
'img' : img[3],
}
答案 1 :(得分:0)
通常不采用extract_first()并且仅执行单页请求。也许您最好发布更多蜘蛛代码?好,更重要的是 您可以显示_scraped_info_的调试打印吗?输出没有什么意义
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.6.9/angular.min.js"></script>
</head>
<body ng-app="myApp">
<div ng-controller="thecontroller">
<form>
<table>
<tbody>
<tr ng-repeat="x in names">
<td>{{x.name}}</td>
<td>{{x.email}}</td>
<td>{{x.password}}</td>
</tr>
</tbody>
</table>
</form>
</div>
</body>
</html>
相反,您已经停止使用[indexs]截断项目,并且
{'description': 'f', 'img': 'r', 'price': '0', 'prodname': '0'}
所以,这样
{'description': 'ref: 081097B\r\nMaterial: POLYURETHANE (PUR/PU)\r\nHardness 80ShA\r\nPcs/prod: 1\r\nRequired/car: 2\r\nTo every product we add grease!',
'img': 'acura-integra-93-01_files/front-anti-roll-bar-bush.jpg',
'price': '10,01 €',
'prodname': '081097B: Front anti roll bar bush'}
NotaBene:问题似乎是雇主的一项测试任务,不是吗?如果是这种情况,请稍后答复。祝你好运!