我想根据特定条件从数据集中的值创建一个列表

时间:2019-04-18 19:39:06

标签: python pandas loops dataframe if-statement

我正在使用一个数据集,其中包含自1985年以来每场三月疯狂比赛的信息。我想知道哪些球队赢得了全部冠军,每支球队赢得了多少次。

我掩盖了主要数据集,并创建了一个仅包含有关冠军赛信息的新数据集。现在,我正在尝试创建一个循环,比较在锦标赛中参加比赛的两支球队的得分,检测获胜者并将该队添加到列表中。数据集如下所示:https://imgur.com/tXhPYSm

tourney = pd.read_csv('ncaa.csv')

champions = tourney.loc[tourney['Region Name'] == "Championship", ['Year','Seed','Score','Team','Team.1','Score.1','Seed.1']]

list_champs = []

for i in champions:
    if champions['Score'] > champions['Score.1']:
        list_champs.append(i['Team'])
    else:
        list_champs.append(i['Team.1'])

2 个答案:

答案 0 :(得分:0)

进行最小化更改(不是最有效的更改)以使代码正常工作:

tourney = pd.read_csv('ncaa.csv')

champions = tourney.loc[tourney['Region Name'] == "Championship", ['Year','Seed','Score','Team','Team.1','Score.1','Seed.1']]

list_champs = []

for row in champions.iterrows():
    if row['Score'] > row['Score.1']:
        list_champs.append(row['Team'])
    else:
        list_champs.append(row['Team.1'])

否则,您可以简单地执行以下操作:

df.apply(lambda row: row['Team'] if row['Score'] > row['Score.1'] else row['Team.1'], axis=1).values

答案 1 :(得分:0)

为什么需要遍历DataFrame

基本过滤应该可以正常工作。像这样:

champs1 = champions.loc[champions['Score'] > champions['Score.1'], 'Team']
champs2 = champions.loc[champions['Score'] < champions['Score.1'], 'Team.1']

list_champs = list(champs1) + list(champs2)