2016-07-05 21:06:01 [scrapy] INFO: Scrapy 1.1.0 started (bot: tutorial)
2016-07-05 21:06:01 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tutorial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'tutorial'}
2016-07-05 21:06:01 [scrapy] INFO: Enabled extensions:
2016-07-05 21:06:02 [scrapy] INFO: Enabled downloader middlewares:
2016-07-05 21:06:02 [scrapy] INFO: Enabled spider middlewares:
2016-07-05 21:06:02 [scrapy] INFO: Enabled item pipelines:
2016-07-05 21:06:02 [scrapy] INFO: Spider opened
2016-07-05 21:06:02 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-07-05 21:06:02 [scrapy] DEBUG: Telnet console listening on
2016-07-05 21:06:02 [scrapy] INFO: Closing spider (finished)
2016-07-05 21:06:02 [scrapy] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 7, 5, 13, 6, 2, 381000),
'log_count/DEBUG': 1,
'log_count/INFO': 7,
'start_time': datetime.datetime(2016, 7, 5, 13, 6, 2, 381000)}
2016-07-05 21:06:02 [scrapy] INFO: Spider closed (finished)

# -*- coding: utf-8 -*-
import scrapy
from tutorial.items import TutorialItem
class DmozSpider(scrapy.Spider):
name = 'dmoz'
allowed_domains = ['dmoz.org']
strat_urls = ('http://www.dmoz.org/Computers/Programming/Languages/Python/Books/')
def parse(self,response):
lislink = response.xpath('/html/body/div[5]/div/section[3]/div/div/div[*]/div[3]/a')
for li in lislink:
item = TutorialItem()
item['link'] = li.xpath('@href').extract()
yield item

# -*- coding: utf-8 -*-
# Define here the models for your scraped items
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class TutorialItem(scrapy.Item):
# define the fields for your item here like:
link = scrapy.Field()

D:\pythonweb\scrapy\test2>scrapy shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/
2016-07-05 21:06:40 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-05 21:06:40 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-07-05 21:06:40 [scrapy] INFO: Enabled extensions:
2016-07-05 21:06:40 [scrapy] INFO: Enabled downloader middlewares:
2016-07-05 21:06:40 [scrapy] INFO: Enabled spider middlewares:
2016-07-05 21:06:40 [scrapy] INFO: Enabled item pipelines:
2016-07-05 21:06:40 [scrapy] DEBUG: Telnet console listening on
2016-07-05 21:06:40 [scrapy] INFO: Spider opened
2016-07-05 21:06:42 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x03BF0E30>
[s] item {}
[s] request <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
[s] response <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
[s] settings <scrapy.settings.Settings object at 0x03BF05F0>
[s] spider <DefaultSpider 'default' at 0x432b1d0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
>>> lislink = response.xpath('/html/body/div[5]/div/section[3]/div/div/div[*]/div[3]/a')
>>> lislink.xpath('@href').extract()
[u'http://www.pearsonhighered.com/educator/academic/product/0,,0130260363,00%2Ben-USS_01DBC.html', u'http://www.brpreiss.com/books/opus7/html/book.html', u'http://www.diveintopython.net/', u'http://rhodesmill.org/brandon/2011/foundations-of-python-network-programming/', u'http://www.techbooksforfree.com/perlpython.shtml', u'http://www.freetechbooks.com/python-f6.html', u'http://greenteapress.com/thinkpython/', u'http://www.network-theory.co.uk/python/intro/', u'http://www.freenetpages.co.uk/hp/alan.gauld/', u'http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471219754.html', u'http://hetland.org/writing/practical-python/', u'http://sysadminpy.com/', u'http://www.qtrac.eu/py3book.html', u'http://www.wiley.com/WileyCDA/WileyTitle/productCd-0764548077.html', u'https://www.packtpub.com/python-3-object-oriented-programming/book', u'http://www.network-theory.co.uk/python/language/', u'http://www.pearsonhighered.com/educator/academic/product/0,,0130409561,00%2Ben-USS_01DBC.html', u'http://www.informit.com/store/product.aspx?isbn=0201616165&redir=1', u'http://www.pearsonhighered.com/educator/academic/product/0,,0201748843,00%2Ben-USS_01DBC.html', u'http://www.informit.com/store/product.aspx?isbn=0672317354', u'http://gnosis.cx/TPiP/', u'http://www.informit.com/store/product.aspx?isbn=0130211192']
Scrapy : 1.1.0
lxml :
libxml2 : 2.9.0
Twisted : 16.2.0
Python : 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)]
pyOpenSSL : 16.0.0 (OpenSSL 1.0.2h 3 May 2016)
Platform : Windows-10-10.0.10586
答案 0 :(得分:1)
for x in dic:
print ("sum of key " + str(x) + " " + str(sum(dic[x])))
print ("length = " + str(len(dic[x])))
,而是sum of key a 6
length = 3
sum of key c 23
length = 3
sum of key b 13
length = 3
sum of key e 6
length = 2
sum of key d 10
length = 4
sum of key f 100
length = 3