I am new in Python and try to scrape data from the web to (eventually) feed a small database.
My code is generating a NoneType
error. Could you assist?
import urllib2
from bs4 import BeautifulSoup
#1- create files to Leagues, stock data and error
FLeague= open("C:\Python\+exercice\SoccerLeague.txt","w")
FData=open("C:\Python\+exercice\FootballDump.txt","w")
ErrorFile=open("C:\Python\+exercice\ErrorFootballScrap.txt","w")
#Open the website
# 1- grab the data and get the error too
soup = BeautifulSoup(urllib2.urlopen("http://www.soccerstats.com/leagues.asp").read(),"html")
TableLeague = soup.find("table", {"class" : "trow8"})
print TableLeague
#\here I just want to grab country name
for row in TableLeague("tr")[2:]:
col = row.findAll("td")
# I try to identify errors
try:
country = col[1].a.string.stip()
FLeague.write(country+"\n")
except Exception as e:
ErrorFile.write (country + ";" + str(e)+";"+str(col)+"\n")
pass
#close the files
FLeague.close
FData.close
ErrorFile.close
答案 0 :(得分:0)
第一个问题来自:
TableLeague("tr")[2:]
TableLeague
此处为None
,因为table
类没有trow8
个元素。而是使用id
属性来查找所需的表元素:
TableLeague = soup.find("table", {"id": "btable"})
此外,您可能需要strip()
而不是stip()
:col[1].a.string.stip()
。
并且,要关闭文件,请调用close()
方法。替换:
FLeague.close
FData.close
ErrorFile.close
使用:
FLeague.close()
FData.close()
ErrorFile.close()
或者,更好的是,使用with
context manager来处理文件 - 您不需要显式关闭文件。