我正在使用一个数据集,其中包含自1985年以来每场三月疯狂比赛的信息。我想知道哪些球队赢得了全部冠军,每支球队赢得了多少次。
我掩盖了主要数据集,并创建了一个仅包含有关冠军赛信息的新数据集。现在,我正在尝试创建一个循环,比较在锦标赛中参加比赛的两支球队的得分,检测获胜者并将该队添加到列表中。数据集如下所示:https://imgur.com/tXhPYSm
tourney = pd.read_csv('ncaa.csv')
champions = tourney.loc[tourney['Region Name'] == "Championship", ['Year','Seed','Score','Team','Team.1','Score.1','Seed.1']]
list_champs = []
for i in champions:
if champions['Score'] > champions['Score.1']:
list_champs.append(i['Team'])
else:
list_champs.append(i['Team.1'])
答案 0 :(得分:0)
进行最小化更改(不是最有效的更改)以使代码正常工作:
tourney = pd.read_csv('ncaa.csv')
champions = tourney.loc[tourney['Region Name'] == "Championship", ['Year','Seed','Score','Team','Team.1','Score.1','Seed.1']]
list_champs = []
for row in champions.iterrows():
if row['Score'] > row['Score.1']:
list_champs.append(row['Team'])
else:
list_champs.append(row['Team.1'])
否则,您可以简单地执行以下操作:
df.apply(lambda row: row['Team'] if row['Score'] > row['Score.1'] else row['Team.1'], axis=1).values
答案 1 :(得分:0)
为什么需要遍历DataFrame
?
基本过滤应该可以正常工作。像这样:
champs1 = champions.loc[champions['Score'] > champions['Score.1'], 'Team']
champs2 = champions.loc[champions['Score'] < champions['Score.1'], 'Team.1']
list_champs = list(champs1) + list(champs2)