如何比较熊猫数据框中的整数值

时间:2018-09-02 20:49:15

标签: python python-3.x pandas

我正在编写一个程序,该程序需要读取带有如下所示足球得分的csv /文本文件:

Lions 3, Snakes 3 
Tarantulas 1, FC Awesome 0 
Lions 1, FC Awesome 1 
Tarantulas 3, Snakes 1 
Lions 4, Grouches 0

如果球队平局,则每支球队得1分;如果一支球队获胜,则得3分。

理想情况下,输出应如下所示:

1. Tarantulas, 6 pts 
2. Lions, 5 pts 
3. FC Awesome, 1 pt 
3. Snakes, 1 pt 
4. Grouches, 0 pts 

这是我到目前为止的代码:

import pandas as pd


data = pd.read_csv("sample_input.csv", header=None, names=['left_team', 'right_team'])
data_dict = data.to_dict(orient='list')


def splitter(row):
    left_team, right_team = row.split(',')
    return {
       'left_team': left_team[:-2].strip(),
       'left_score': int(left_team[-2:].strip()),
       'right_team': right_team[:-2].strip(),
       'right_score': int(right_team[-2:].strip())
}

我的问题是如何获取数据帧中的数据以比较值?我也尝试了在没有熊猫的情况下对解决方案进行编码,但是我为此感到挣扎。任何帮助将不胜感激!谢谢!

这是我尝试过的另一种解决方案:

from collections import defaultdict
import csv


reader = csv.DictReader(open('sample_input.csv', 'r'))

dict_list = []

for line in reader:
    dict_list.append(line)


data_list = [splitter(row) for row in reader]


def splitter(row):
    left_team, right_team = row.split(',')
    return {
       'left_team': left_team[:-2].strip(),
       'left_score': int(left_team[-2:].strip()),
       'right_team': right_team[:-2].strip(),
       'right_score': int(right_team[-2:].strip())
}


data_dicts = [splitter(row) for row in reader]


team_scores = defaultdict(int)

for game in data_dicts:
    if game['left_score'] == game['right_score']:
        team_scores[game['left']] += 1
        team_scores[game['right']] += 1
    elif game ['left_score'] > game['right_score']:
        team_scores[game['left']] += 3
    else:
        team_scores[game['right']] += 3


teams_sorted = sorted(team_scores.items(), key=lambda team: team[1], reverse=True)

for line in teams_sorted:
    print(line)

2 个答案:

答案 0 :(得分:0)

这是一个简单的解决方案。第一步是清理数据,然后为每个团队分配分数。最后,您将每个团队的所有积分加在一起,无论它们出现在左侧还是右侧。

import pandas as pd
import numpy as np

# Create DataFrame from your input
df = pd.read_clipboard(sep=', ', header=None)
df.columns=['l_team', 'r_team']

# Clean the data, separating teams and their score. 
df[['l_team', 'l_score']] = df.l_team.str.extract('(.*)\s(\d+)')
df[['r_team', 'r_score']] = df.r_team.str.extract('(.*)\s(\d+)')
df[['l_score', 'r_score']] = df[['l_score', 'r_score']].astype('int')

现在df如下:

       l_team      r_team  l_score  r_score
0       Lions      Snakes        3        3
1  Tarantulas  FC Awesome        1        0
2       Lions  FC Awesome        1        1
3  Tarantulas      Snakes        3        1
4       Lions    Grouches        4        0

确定左边或右边的球队得分了多少,并按队相加。我们使用Series.add,因此它与索引对齐,在groupby之后只是团队名称。

df['l_pts'] = np.select([df.l_score > df.r_score, df.l_score == df.r_score], [3, 1], 0)
df['r_pts'] = np.select([df.r_score > df.l_score, df.r_score == df.l_score], [3, 1], 0)

scores df.groupby('l_team').l_pts.sum().add(df.groupby('r_team').r_pts.sum(), fill_value=0).astype('int').sort_values(ascending=False)

输出:scores

Tarantulas    6
Lions         5
Snakes        1
FC Awesome    1
Grouches      0
dtype: int32

要完全匹配您的输出,可以执行以下操作:

pd.Series(scores.index+', '+scores.values.astype('str')+' pts', index=np.arange(1,len(scores)+1,1))
#1    Tarantulas, 6 pts
#2         Lions, 5 pts
#3        Snakes, 1 pts
#4    FC Awesome, 1 pts
#5      Grouches, 0 pts

答案 1 :(得分:0)

这里没有魔术。只需定义一个将分数转换为分数的函数,然后应用该函数,取消左右旋转,按组分组并对分数求和即可。可能会有更优雅的解决方案。

使用您的函数准备数据:

data = '''Lions 3, Snakes 3 
Tarantulas 1, FC Awesome 0 
Lions 1, FC Awesome 1 
Tarantulas 3, Snakes 1 
Lions 4, Grouches 0'''

def splitter(row):
    left_team, right_team = row.split(',')
    return {
       'left_team': left_team[:-2].strip(),
       'left_score': int(left_team[-2:].strip()),
       'right_team': right_team[:-2].strip(),
       'right_score': int(right_team[-2:].strip())
}

data = pd.DataFrame(splitter(row) for row in data.split("\n"))
print(data)

Out:
       left_score   left_team  right_score  right_team
0           3       Lions            3      Snakes
1           1  Tarantulas            0  FC Awesome
2           1       Lions            1  FC Awesome
3           3  Tarantulas            1      Snakes
4           4       Lions            0    Grouches

使用得分添加球队得分列

def points(left_score, right_score):

    win_points = 3
    draw_points = 1
    lose_points = 0

    if left_score < right_score:
        return pd.Series({'left_points': lose_points, 'right_points': win_points})
    elif left_score > right_score:
        return pd.Series({'left_points': win_points, 'right_points': lose_points})
    else:
        return pd.Series({'left_points': draw_points, 'right_points': draw_points})

data = data.merge(
    data[['left_score', 'right_score']].apply(lambda row: points(*row), axis=1),
    left_index=True, right_index=True
)
print(data)

Out:
   left_score   left_team  right_score  right_team  left_points  right_points
0           3       Lions            3      Snakes            1             1
1           1  Tarantulas            0  FC Awesome            3             0
2           1       Lions            1  FC Awesome            1             1
3           3  Tarantulas            1      Snakes            3             0
4           4       Lions            0    Grouches            3             0

取消枢纽:

data = pd.concat([
    data[['left_team', 'left_points']]\
    .rename({'left_team': 'team', 'left_points': 'points'}, axis=1),
    data[['right_team', 'right_points']]\
    .rename({'right_team': 'team', 'right_points': 'points'}, axis=1)
])

print(data)

Out:
         team  points
0       Lions       1
1  Tarantulas       3
2       Lions       1
3  Tarantulas       3
4       Lions       3
0      Snakes       1
1  FC Awesome       0
2  FC Awesome       1
3      Snakes       0
4    Grouches       0

分组依据以获得最终结果:

result = data.groupby("team")["points"].sum()
print(result)

Out:
team
FC Awesome    1
Grouches      0
Lions         5
Snakes        1
Tarantulas    6
Name: points, dtype: int64
相关问题