我的文件列表:
.
|-- lf
| |-- __init__.py
| |-- __init__.pyc
| |-- items.py
| |-- items.pyc
| |-- pipelines.py
| |-- settings.py
| |-- settings.pyc
| `-- spiders
| |-- bbc.py
| |-- bbc.pyc
| |-- __init__.py
| |-- __init__.pyc
| |-- lwifi.py
| `-- lwifi.pyc
|-- scrapy.cfg
`-- script.py
items.py
from scrapy.item import Item, Field
class LfItem(Item):
topic = Field();
script.py:
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from lf.spiders.lwifi import LwifiSpider
from scrapy.utils.project import get_project_settings
spider = LwifiSpider(domain='Lifehacker.co.in')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()
lwifi.py:
from scrapy.spider import Spider
from scrapy.selector import Selector
class LwifiSpider(Spider):
name = "lwifi"
def __init__(self, **kw):
super(LwifiSpider, self).__init__(**kw)
url = kw.get('url') or kw.get('domain') or 'lifehacker.co.in/others/Dont-Use- Personal-Information-in-Your-Wi-Fi-Network-Name/articleshow/45407704.cms'
if not url.startswith('http://') and not url.startswith('https://'):
url = 'http://%s/' % url
self.url = url
self.allowed_domains = ["lifehacker.co.in/others/Dont-Use-Personal-Information-in-Your-Wi-Fi-Network-Name/articleshow/45407704.cms"]
def start_requests(self):
return [Request(self.url, callback=self.parse)]
def parse(self, response):
topic = response.xpath("//h1/text()").extract();
print topic
我是蟒蛇和scrapy的新手。作为一个开始,我写了一个简单的scrapy蜘蛛从python脚本运行(不使用scrapinghub)。我的目标是从页面http://lifehacker.co.in/others/Dont-Use-Personal-Information-in-Your-Wi-Fi-Network-Name/articleshow/45407704.cms中删除h1。错误是
Traceback (most recent call last):
File "script.py", line 4, in <module>
from lf.spiders.lwifi import LwifiSpider
File "/home/ajay/pythonpr/error/lf/lf/spiders/lwifi.py", line 7, in <module>
class LwifiSpider(Spider):
File "/home/ajay/pythonpr/error/lf/lf/spiders/lwifi.py", line 11, in LwifiSpider
url = kw.get('url') or kw.get('domain') or 'lifehacker.co.in/others/Dont-Use-Personal- Information-in-Your-Wi-Fi-Network-Name/articleshow/45407704.cms'
NameError: name 'kw' is not defined
请帮忙。
答案 0 :(得分:0)
如果仔细查看回溯,您会发现错误发生在LwifiSpider
类的正文中:
File "/home/.../lwifi.py", line 11, in LwifiSpider
如果该类的__init__
发生错误,您会看到这样的一行:
File "/home/.../lwifi.py", line 11, in __init__
因此,似乎存在某种缩进错误,导致有问题的行在__init__
方法的外,其中kw
参数不能可见
尝试重新缩进整个__init__
函数,并确保您没有在任何地方混合制表符和空格(任何体面的文本编辑器都应该允许您将所有空格都可见)。
答案 1 :(得分:0)