Python:在pd.DataFrame

时间:2018-07-24 14:13:37

标签: python pandas dataframe

我想遍历DataFrame的行,以计算许多运动队的力量等级。

DataFrame列'home_elo''away_elo'包含所涉及球队的赛前实力等级(ELO得分),并在赛后下一场主场/客场比赛的行中进行更新(每个团队在任何时间点(对于主场比赛和客场比赛)都有两个强度等级,update_elo(a,b,c)得到了回报。

各个代码段如下所示:

for index in df.index:

    counter = counter + 1
    # Calculation of post-match ELO scores for home and away teams
    if df.at[index,'updated'] == 2: # Update next match ELO scores if not yet updated but pre-match ELO scores available

        try:
            all_home_fixtures = df.date_rank[df['localteam_id'] == df.at[index,'localteam_id']]
            next_home_fixture = all_home_fixtures[all_home_fixtures > df.at[index,'date_rank']].min()
            next_home_index = df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])].index.item()
        except ValueError:
            print('ERROR 1 at' + str(index))
            df.at[index,'updated'] = 4

        try:
            all_away_fixtures = df.date_rank[df['visitorteam_id'] == df.at[index,'visitorteam_id']]
            next_away_fixture = all_away_fixtures[all_away_fixtures > df.at[index,'date_rank']].min()
            next_away_index = df[(df['date_rank'] == next_away_fixture) & (df['visitorteam_id'] == df.at[index,'visitorteam_id'])].index.item()
        except ValueError:
            print('ERROR 2 at' + str(index))
            df.at[index,'updated'] = 4

        # print('Current: ' + str(df.at[index,'fixture_id']) + '; Followed by: ' + str(next_home_fixture))
        # print('Current date rank: ' + str(df.at[index,'date']) + ' ' + str(df.at[index,'date_rank']) + '; Next home date rank: ' + str(df.at[next_home_index,'date_rank']) + '; Next away date rank: ' + str(df.at[next_away_index,'date_rank']))

        df.at[next_home_index, 'home_elo'] = update_elo(df.at[index,'home_elo'],df.at[index,'away_elo'],df.at[index,'actual_score'])
        df.at[next_away_index, 'away_elo'] = update_elo(df.at[index,'away_elo'],df.at[index,'home_elo'],1 - df.at[index,'actual_score']) # Swap function inputs for away team


        df.at[next_home_index, 'updated'] = df.at[next_home_index, 'updated'] + 1
        df.at[next_away_index, 'updated'] = df.at[next_away_index, 'updated'] + 1

        df.at[index,'updated'] = 3

该代码在前几行中运行良好。但是,即使我看不到这些行与其他行有何不同,也总是遇到相同行的错误。

  1. 如果我没有如上所述处理ValueError,则在大约250行之后,我第一次收到错误消息ValueError: can only convert an array of size 1 to a Python scalar
  2. 如果我确实如上所示处理ValueError,则会捕获四个这样的错误,每个错误处理块都会捕获两个错误(否则代码会正常工作),但是代码在此之后将停止更新任何其他强度等级约占所有行的18%,而不会引发任何错误消息。

如果您能帮助我(a)了解导致错误的原因以及(b)如何处理错误的信息,我将不胜感激。

由于这是我在StackOverflow上的第一篇帖子,因此我仍未完全了解该论坛的常见发布惯例。请让我知道我的帖子是否有什么可以改善的地方。

非常感谢您!

2 个答案:

答案 0 :(得分:3)

仅供参考,

如果将.item应用于numpy数组,则会出现类似的错误。

在这种情况下,您可以使用.tolist()进行解决。

答案 1 :(得分:0)

WITH dep AS (SELECT depairport AS airport, count(CASE WHEN a.status = 'Scheduled' AND a.actual_blockoff IS NOT NULL THEN 1 END) AS scheduled, count(CASE WHEN( ( a.actual_blockoff + interval '7' hour ) - ( a.scheduled_depdt + interval '7' hour ) ) * 24 * 60 <= '+000000015 00:00:00.000000000' AND a.actual_blockoff IS NOT NULL THEN 1 END) AS ontime FROM tablea GROUP BY depairport), arr AS (SELECT arrivalairport AS airport, count(CASE WHEN( ( a.actual_blockon + interval '7' hour ) - ( a.scheduled_arrdt + interval '7' hour ) ) * 24 * 60 <= '+000000015 00:00:00.000000000' AND a.actual_blockon IS NOT NULL THEN 1 END) AS arrontime FROM tablea GROUP BY arrivalairport) SELECT dep.airport AS Name, dep.scheduled AS "#Schedule", dep.ontime AS "#OnTimeDeparture", arr.arrontime AS "#ArrivalOntime" FROM dep left join arr -- Or Inner join depending on the expected output. ON ( dep.airport = arr.airport ); 需要Series中的至少一项才能返回标量。如果:

pd.Series.item

是一个长度为0的序列,则df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])] 将引发ValueError。