我是Python的新手,我收到了这个错误:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 130, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 96, in _run_print_help
func(*a, **kw)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 136, in _run_command
cmd.run(args, opts)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/commands/crawl.py", line 42, in run
q = self.crawler.queue
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/command.py", line 31, in crawler
self._crawler.configure()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/crawler.py", line 36, in configure
self.spiders = spman_cls.from_settings(self.settings)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/spidermanager.py", line 33, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/spidermanager.py", line 23, in __init__
for module in walk_modules(name):
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/my_crawler/empt/empt/spiders/empt_spider.py", line 59
check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
^
IndentationError: unexpected indent
在这段代码上:
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//a[contains(@href, ".mp3")]/@href').extract()
items = [ ]
#for site in sites:
#link = site.select('a/@href').extract()
#print site
for site in sites:
item = EmptItem()
item['link'] = site #site.select('a/@href').extract()
#### DB INSERT ATTEMPT ###
#MySQL Test
#open db connection
db = MySQLdb.connect("localhost","root","str0ng","TESTDB")
#prepare a cursor object using cursor() method
cursor = db.cursor()
#see if any links in the DB match the crawled link
check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
cursor.execute(check_exists_sql)
if cursor.rowcount = 0:
#prepare SQL query to insert a record into the db.
sql = "INSERT INTO LINKS ( link ) VALUES ( '%s')" % item['link']
try:
#execute the sql command
cursor.execute(sql)
#commit your changes to the db
db.commit()
except:
#rollback on error
db.rollback()
#fetch a single row using fetchone() method.
#data = cursor.fetchone()
#print "Database version: %s " % data
#disconnect from server
db.close()
### end mysql
items.append(item)
return items
答案 0 :(得分:40)
虽然StackOverflow页面中的缩进错误很明显,但它们可能不在您的编辑器中。这里有不同的缩进类型,1,4和8个空格。根据{{3}},您应该始终使用四个空格进行缩进。你还应该PEP8。
我还建议您尝试使用avoid mixing tabs and spaces运行脚本,以确定何时意外混合制表符和空格。当然,任何体面的编辑器都能够突出显示标签与空格(例如'-tt
' command-line option)。
答案 1 :(得分:3)
缩进错误,正如错误告诉您的那样。正如您所看到的,您已经缩短了以指示的行开头的代码,而不是for
循环中的代码,但是太多而不能与for循环处于同一级别。 Python认为缺少缩进作为结束for
循环,然后抱怨你已经过多地缩进了其余的代码。 (我所投注的def
行只是Stack Overflow希望您格式化代码的工件。)
编辑:鉴于您的更正,我打赌您在源文件中混合使用制表符和空格,这样它就像代码排列的人眼一样,但Python认为它没有。正如其他人所建议的那样,建议的做法是使用空格(参见PEP 8)。如果您使用python -t
启动Python,如果代码中有混合标签和空格,您将收到警告,这可以帮助您查明问题。
答案 2 :(得分:1)
错误非常简单 - 以check_exists_sql
开头的行没有正确缩进。从代码的上下文中,我会缩进它和以下行以匹配它之前的行:
#open db connection
db = MySQLdb.connect("localhost","root","str0ng","TESTDB")
#prepare a cursor object using cursor() method
cursor = db.cursor()
#see if any links in the DB match the crawled link
check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
cursor.execute(check_exists_sql)
继续缩进,直到for
循环结束(一直到items.append(item)
为止。
答案 3 :(得分:0)
由于错误显示您没有正确缩进代码,check_exists_sql
未与其上方的行cursor = db.cursor()
对齐。
还可以使用4个空格进行缩进。
阅读此http://diveintopython.net/getting_to_know_python/indenting_code.html
答案 4 :(得分:0)
import urllib.request
import requests
from bs4 import BeautifulSoup
r = requests.get('https://icons8.com/icons/set/favicon')
如果您尝试连接到这样的站点,则会出现缩进错误。
import urllib.request
import requests
from bs4 import BeautifulSoup
r = requests.get('https://icons8.com/icons/set/favicon')
Python关心缩进
答案 5 :(得分:-1)
如果您未正确写入块,则会发生此错误。忘记“:”,或者不使用“制表符”按钮来显示块并使用空格。当您将代码从一个编辑器传输到另一个编辑器时,可能会发生。永远不要忘记这一点:错误并不总是在那一行上。我是来这里的,但是尝试之后我忘记了一个例外。因为我的编辑器不规范,所以发生了。但这在普通编辑器中是可能的。