我正试图抓住BBC足球比赛结果网站,以获得球队,投篮,进球,卡牌和事故。我目前有3个团队数据传入URL。
我用Python编写脚本并使用Beautiful soup bs4
包。将结果输出到屏幕时,将打印第一个团队,第一个和第二个团队,然后是第一个,第二个和第三个团队。因此,当我试图让三支球队只进行一次时,第一支队伍有效地被打印了3次。
我将此问题排序后,我会将结果写入文件。我将团队数据添加到数据框中然后添加到列表中(我不确定这是否是最好的方法)。
我确定是否与for
循环有关,但我不确定如何解决问题。
代码:
from bs4 import BeautifulSoup
import urllib2
import pandas as pd
out_list = []
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'):
url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false'
teams_list = []
inner_page = urllib2.urlopen(url).read()
soupb = BeautifulSoup(inner_page, 'lxml')
for report in soupb.find_all('td', 'match-details'):
home_tag = report.find('span', class_='team-home')
home_team = home_tag and ''.join(home_tag.stripped_strings)
score_tag = report.find('span', class_='score')
score = score_tag and ''.join(score_tag.stripped_strings)
shots_tag = report.find('span', class_='shots-on-target')
shots = shots_tag and ''.join(shots_tag.stripped_strings)
away_tag = report.find('span', class_='team-away')
away_team = away_tag and ''.join(away_tag.stripped_strings)
df = pd.DataFrame({'away_team' : [away_team], 'home_team' : [home_team], 'score' : [score], })
out_list.append(df)
for shots in soupb.find_all('td', class_='shots'):
home_shots_tag = shots.find('span',class_='goal-count-home')
home_shots = home_shots_tag and ''.join(home_shots_tag.stripped_strings)
away_shots_tag = shots.find('span',class_='goal-count-away')
away_shots = away_shots_tag and ''.join(away_shots_tag.stripped_strings)
dfb = pd.DataFrame({'home_shots': [home_shots], 'away_shots' : [away_shots] })
out_list.append(dfb)
for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"):
home_inc_tag = incidents.find("td", class_="incident-player-home")
home_inc = home_inc_tag and ''.join(home_inc_tag.stripped_strings)
type_inc_goal_tag = incidents.find("td", "span", class_="incident-type goal")
type_inc_goal = type_inc_goal_tag and ''.join(type_inc_goal_tag.stripped_strings)
type_inc_tag = incidents.find("td", class_="incident-type")
type_inc = type_inc_tag and ''.join(type_inc_tag.stripped_strings)
time_inc_tag = incidents.find('td', class_='incident-time')
time_inc = time_inc_tag and ''.join(time_inc_tag.stripped_strings)
away_inc_tag = incidents.find('td', class_='incident-player-away')
away_inc = away_inc_tag and ''.join(away_inc_tag.stripped_strings)
df_incidents = pd.DataFrame({'home_player' : [home_inc],'event_type' : [type_inc_goal],'event_time': [time_inc],'away_player' : [away_inc]})
out_list.append(df_incidents)
print "end"
print out_list
我是python和堆栈溢出的新手,关于格式化我的问题的任何建议也很有用。
提前致谢!
答案 0 :(得分:1)
这3个for循环应该在你的main for循环中。
[{'score': ['1-3'], 'away_team': ['Man City'], 'home_team': ['Dynamo Kiev']},
{'score': ['1-0'], 'away_team': ['Zenit St P'], 'home_team': ['Benfica']},
{'score': ['1-2'], 'away_team': ['Boston United'], 'home_team': ['Bradford Park Avenue']}]
效果很好 - 只出现一次团队。
这是第一个for循环的输出:
var canvas = document.getElementById("canvas");
var context = canvas.getContext("2d");
var img = new Image();
img.src = "";
context.resetTransform();
context.translate(205, 205);
context.drawImage(img, -201,-201,402,402);
var angle = 0;
var angularVelocity = .01;
function draw() {
// ease out
angularVelocity += .0035;
angle -= (1 / angularVelocity);
context.resetTransform();
context.clearRect(0, 0, canvas.width, canvas.height);
context.translate(205, 205);
context.rotate(angle * Math.PI/180);
context.drawImage(img, -201,-201,402,402);
if (angularVelocity+0.1<1){
spin = requestAnimationFrame(draw);
}
}
function startSpin() {
angularVelocity = .01;
spin = requestAnimationFrame(draw);
}
答案 1 :(得分:0)
这看起来像打印问题,在打印out_list的缩进级别是什么?
它应该是零缩进,一直到代码中的左边。
或者你想将out_list移动到最顶层的循环中,以便在每次迭代后重新分配。