我正在尝试从http://www.livescore.co.uk/worldcup/tables/.I解析表管理输出时遇到问题。我想在输出中只显示文本,并且我想在显示所有td之后每个tr放一个中断.I我是初学者,我正在努力学习。所以任何人都可以建议我做错了什么。有什么建议吗?
from BeautifulSoup import BeautifulSoup
import urllib2
pageSource=urllib2.urlopen('http://www.livescore.com/worldcup/tables/').read()
soup = BeautifulSoup(pageSource)
alltables = soup.findAll( "table", {"class":"league-wc table bh"} )
results=[]
for table in alltables:
rows = table.findAll('tr')
lines=[]
for tr in rows[1:]:
cols = tr.findAll('td')
for td in cols:
text=td.renderContents().strip('\n')
lines.append(text)
text_table='\n'.join(lines)
print text_table
输出:
<a href="/worldcup/team-brazil/">Brazil</a>
0
0
0
0
0
0
0
0
1
<a href="/worldcup/team-cameroon/">Cameroon</a>
0
0
0
0
0
0
0
0
1
<a href="/worldcup/team-croatia/">Croatia</a>
0
0
0
0
0
0
0
0
1
<a href="/worldcup/team-mexico/">Mexico</a>
0
0
0
0
0
0
0
0
....similar
我的欲望输出:
1,brazil,0,0,0,0,0,0,0,0,0,0
2,cameroon,0,0,0,0,0,0,0,0,0,0
3,craotia,0,0,0,0,0,0,0,0,0,0
4,Meico,0,0,0,0,0,0,0,0,0,0
答案 0 :(得分:0)
你走了:
from BeautifulSoup import BeautifulSoup
import urllib2
pageSource=urllib2.urlopen('http://www.livescore.com/worldcup/tables/').read()
soup = BeautifulSoup(pageSource)
alltables = soup.findAll( "table", {"class":"league-wc table bh"} )
results=[]
for table in alltables:
rows = table.findAll('tr')
_table = []
for tr in rows[1:]:
_row = []
cols = tr.findAll('td')
for td in cols:
if td.findAll('a'):
text=td.a.renderContents().strip()
else:
text=td.renderContents().strip()
_row.append(text)
_table.append(_row)
results.append(_table)
# print results
index = 1
for table in results:
for row in table:
print ','.join([str(index)] + row[1:])
index += 1
输出:
1,Brazil,0,0,0,0,0,0,0,0
2,Cameroon,0,0,0,0,0,0,0,0
3,Croatia,0,0,0,0,0,0,0,0
4,Mexico,0,0,0,0,0,0,0,0
5,Australia,0,0,0,0,0,0,0,0
6,Chile,0,0,0,0,0,0,0,0
...
这个想法首先收集原始数据,然后编写逻辑以显示数据(以任何方式)。