当我使用以下代码时。
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup (open("43rd-congress.htm"))
final_link = soup.p.a
final_link.decompose()
f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name","Years","Position","Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
f.writerow([names, years, posiitons, parties, states, congress, fullLink])
我得到一个NameError。但是当我尝试纠正错误时,我在最后一行代码中得到一个错误,说变量是未定义的。我已经做了更正,以便将它带到社区现在的位置。我该如何解决?
感谢您的帮助。
我在notepad ++和powershell中运行它。我在本教程的最后一部分... http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/
答案 0 :(得分:2)
names, years, posiitons, parties, states, congress
子句中的第一行引发错误,则永远不会创建 try/except
。
在try
结构中出现错误。假设names = str(tds[0].get_text())
会产生错误。你抓住它,但后来的变量永远不会创建。
您可能需要考虑在try/except
之前设置默认值,例如names = ''
。
您的缩进错误可能只是由于标签和空格的混合,因为您的代码看起来很好。
答案 1 :(得分:0)
# |-> Different from when passed below
print names, years, positions, parties, states, congress
f.writerow([names, years, posiitons, parties, states, congress, fullLink])
# |-> Different from original name |-> Same with fullLink, its supposed to be called fullink when instantiated.
在上面的示例中,positions
和posiitons
不一样。这是一个简单的输入错误。
看看下面的代码,看看它是否运行,因为我没有你的文件。
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup(open("43rd-congress.htm"))
final_link = soup.p.a
final_link.decompose()
f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name", "Years", "Position", "Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fullLink = link.get('href')
print fullLink # print in terminal to verify results
tds = tr.find_all("td")
try: # we are using "try" because the table is not well formatted. This allows the program to continue after
# encountering an error.
# This structure isolate the item by its column in the table and converts it into a string
names = str(tds[0].get_text())
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
print names, years, positions, parties, states, congress
f.writerow([names, years, positions, parties, states, congress, fullLink])
except IndexError:
print "bad tr string"
continue # This tells the computer to move on to the next item after it encounters an error