我正在使用本教程中的以下代码(http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/)。
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
然而,我得到一个错误,说“继续”在第27行的循环中没有正确。我正在使用notepad ++和windows powershell。如何使此代码生效?
答案 0 :(得分:2)
print fulllink
以下的所有内容都在for
循环
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
## indented here!!!!!
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
答案 1 :(得分:1)
看起来您的缩进已关闭,请尝试此操作。
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
答案 2 :(得分:1)
白色空间在python中具有重要意义。
这是事情发展的地方:
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
只要您打算循环,就应该开始并继续使用适当数量的选项卡缩进代码。
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
答案 3 :(得分:0)
除了for循环的缩进之外,你必须在代码中缩进另一个缩进级别(即4个空格/ 1个制表符)。 try / except不是我的for循环,这就是你得到继续错误的原因。
缩进显示块在一起的位置(for循环开始一个新块,你需要在其下面缩进)
答案 4 :(得分:0)
我的答案可能很简单,但它实际上不是在循环中,它必须在循环中以与条件和循环中的break相同的方式运行。也许你的缩进是关闭的,它是一个很大的必要而且在python中非常重要。