<ul class="products-grid">
<li class="item">
<div class="product-block">
<div class="product-block-inner">
<a href="#" title="Product A" class="product-image"><img src="#/producta.jpg"></a>
<h2 class="product-name"><a href="#">Product A</a></h2>
<div class="price-box">
<span class="regular-price" id="#">
<span class="price">Rs 1,849</span>
</span>
</div>
</div>
</div>
</li>
<li class="item">
<div class="product-block">
<div class="product-block-inner">
<a href="#" title="Product B" class="product-image"><img src="#/productb.jpg"></a>
<h2 class="product-name"><a href="#">Product B</a></h2>
<div class="price-box">
<span class="regular-price" id="#">
<span class="price">Rs 1,849</span>
</span>
</div>
</div>
</div>
</li>
</ul>
此刻我正在循环中抓取item
。
products = response.xpath('//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]').extract()
获取product-block-inner
节点后,我将其保存到products
然后我必须像
for product in products:
// parse the div.product-block-inner further deep down
// to get name, price, image etc
// and save it to a dict and yeild
pass
这是否有可能我得到最终列表中所有div.product-block-inner
的文本,href而没有循环
答案 0 :(得分:1)
是的,但这很令人困惑,例如你可以试试这个:
products = response.xpath(
'//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]'
).css(
'.product-name a::attr(href), .product-name a::text, .price::text'
).extract()
但我建议总是循环播放(顺便说一句,为什么在将extract()
分配给products
时调用products = response.xpath(
'//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]'
)
for product in products:
yield {'name': product.css('.product-name a::text').extract_first()
'url': product.css('.product-name a::attr(href)').extract_first()
'price': product.css('.price::text').extract_first()}
?)
...
TcpClient client = server.EndAcceptTcpClient(ar);
int timeout = (int)TimeSpan.FromSeconds(3).TotalMilliseconds;
client.ReceiveTimeout = timeout;
client.SendTimeout = timeout;
...
(在这种情况下我使用了css选择器,因为等效的xpath更长,但使用xpath也可以实现相同的效果)