Question

尝试运行下面的代码时出现缩进错误。我试图以递归方式打印出一组html页面的URL。

import urllib2
from BeautifulSoup import *
from urlparse import urljoin
# Create a list of words to ignore
ignorewords=set(['the','of','to','and','a','in','is','it'])

def crawl(self,pages,depth=2):
for i in range(depth):
newpages=set()
for page in pages:
try:
c=urllib2.urlopen(page)
except:
print "Could not open %s" % page
continue
soup=BeautifulSoup(c.read())
self.addtoindex(page,soup)
links=soup('a')
for link in links:
if ('href' in dict(link.attrs)):
url=urljoin(page,link['href'])
if url.find("'")!=-1: continue
url=url.split('#')[0] # remove location portion
if url[0:4]=='http' and not self.isindexed(url):
newpages.add(url)
linkText=self.gettextonly(link)
self.addlinkref(page,url,linkText)
self.dbcommit()
pages=newpages

Answer 1

你编码完全没有缩进，所以当你尝试运行它时Python会哭。

记住在Python中，空白是很重要的。缩进4个空格而不是制表符可以节省大量“隐形”缩进错误。

我已经投票了，因为代码被粘贴了未格式化/未缩进，这意味着海报不理解python（并且没有阅读基本教程）或粘贴代码而不重新缩进，这使得它无法实现任何人都可以回答。

需要帮助找出python代码中的缩进错误

1 个答案: