我有以下脚本:
#!/usr/bin/python
from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.selector import Selector
from lxml import html
class GivenSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
# filename = response.url.split("/")[-2] + '.html'
select = Selector(response)
urls=GivenSpider()
u=urls.start_urls
titles = select.xpath('//a[@class="listinglink"]/text()').extract()
print ' [*] Start crawling at %s ' % u
for title in titles:
print '\t %s' % title
#configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()
d = runner.crawl(GivenSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()
输出是:
[*] Start crawling at ['http://www.dmoz.org/Computers/Programming/Languages/Python/Books/', 'http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/']
eff-bot's Daily Python URL
O'Reilly Python Center
Python Developer's Guide
Social Bug
[*] Start crawling at ['http://www.dmoz.org/Computers/Programming/Languages/Python/Books/', 'http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/']
Core Python Programming
Data Structures and Algorithms with Object-Oriented Design Patterns in Python
Dive Into Python 3
Foundations of Python Network Programming
Free Python books
FreeTechBooks: Python Scripting Language
How to Think Like a Computer Scientist: Learning with Python
An Introduction to Python
Learn to Program Using Python
Making Use of Python
Practical Python
Pro Python System Administration
Programming in Python 3 (Second Edition)
Python 2.1 Bible
Python 3 Object Oriented Programming
Python Language Reference Manual
Python Programming Patterns
Python Programming with the Java Class Libraries: A Tutorial for Building Web and Enterprise Applications with Jython
Python: Visual QuickStart Guide
Sams Teach Yourself Python in 24 Hours
Text Processing in Python
XML Processing with Python
但我想打印在正确位置爬行的每个网址。像这样:
[*] Start crawling at http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/
eff-bot's Daily Python URL
O'Reilly Python Center
Python Developer's Guide
Social Bug
[*] Start crawling at http://www.dmoz.org/Computers/Programming/Languages/Python/Books/
Core Python Programming
Data Structures and Algorithms with Object-Oriented Design Patterns in Python
Dive Into Python 3
Foundations of Python Network Programming
Free Python books
FreeTechBooks: Python Scripting Language
How to Think Like a Computer Scientist: Learning with Python
An Introduction to Python
Learn to Program Using Python
Making Use of Python
Practical Python
Pro Python System Administration
Programming in Python 3 (Second Edition)
Python 2.1 Bible
Python 3 Object Oriented Programming
Python Language Reference Manual
Python Programming Patterns
Python Programming with the Java Class Libraries: A Tutorial for Building Web and Enterprise Applications with Jython
Python: Visual QuickStart Guide
Sams Teach Yourself Python in 24 Hours
Text Processing in Python
XML Processing with Python
我如何在scrapy中处理它?</ p>
答案 0 :(得分:2)