我无法理解为什么scrapy在第http://www.alize.gen.tr/index.php?is=urun_detay&id=37页上看不到某些// div / text()。为考试:
scrapy view http://www.alize.gen.tr/index.php?is=urun_detay&id=37
和
scrapy shell http://www.alize.gen.tr/index.php?is=urun_detay&id=37
>>> hxs.select("//td[@class='urun_adi']/div/text()").extract()
返回[u' \ r \ n']但必须是[u' \ r \ nANGORA GOLD']
我错了吗?
答案 0 :(得分:1)
适合我:
stav@maia:~$ scrapy shell "http://www.alize.gen.tr/index.php?is=urun_detay&id=37"
2013-03-28 20:36:39-0600 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
...
>>> hxs.select("//td[@class='urun_adi']/div/text()").extract()
[u'\r\nANGORA GOLD']
您使用什么版本的Scrapy?
stav@maia:~$ scrapy version -v
Scrapy : 0.17.0
lxml : 2.3.2.0
libxml2 : 2.7.8
Twisted : 11.1.0
Python : 2.7.3 (default, Aug 1 2012, 05:14:39) - [GCC 4.6.3]
Platform: Linux-3.2.0-39-generic-x86_64-with-Ubuntu-12.04-precise