Question

我想遍历DataFrame的行，以计算许多运动队的力量等级。

DataFrame列'home_elo'和'away_elo'包含所涉及球队的赛前实力等级（ELO得分），并在赛后下一场主场/客场比赛的行中进行更新（每个团队在任何时间点（对于主场比赛和客场比赛）都有两个强度等级，update_elo(a,b,c)得到了回报。

各个代码段如下所示：

for index in df.index:

    counter = counter + 1
    # Calculation of post-match ELO scores for home and away teams
    if df.at[index,'updated'] == 2: # Update next match ELO scores if not yet updated but pre-match ELO scores available

        try:
            all_home_fixtures = df.date_rank[df['localteam_id'] == df.at[index,'localteam_id']]
            next_home_fixture = all_home_fixtures[all_home_fixtures > df.at[index,'date_rank']].min()
            next_home_index = df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])].index.item()
        except ValueError:
            print('ERROR 1 at' + str(index))
            df.at[index,'updated'] = 4

        try:
            all_away_fixtures = df.date_rank[df['visitorteam_id'] == df.at[index,'visitorteam_id']]
            next_away_fixture = all_away_fixtures[all_away_fixtures > df.at[index,'date_rank']].min()
            next_away_index = df[(df['date_rank'] == next_away_fixture) & (df['visitorteam_id'] == df.at[index,'visitorteam_id'])].index.item()
        except ValueError:
            print('ERROR 2 at' + str(index))
            df.at[index,'updated'] = 4

        # print('Current: ' + str(df.at[index,'fixture_id']) + '; Followed by: ' + str(next_home_fixture))
        # print('Current date rank: ' + str(df.at[index,'date']) + ' ' + str(df.at[index,'date_rank']) + '; Next home date rank: ' + str(df.at[next_home_index,'date_rank']) + '; Next away date rank: ' + str(df.at[next_away_index,'date_rank']))

        df.at[next_home_index, 'home_elo'] = update_elo(df.at[index,'home_elo'],df.at[index,'away_elo'],df.at[index,'actual_score'])
        df.at[next_away_index, 'away_elo'] = update_elo(df.at[index,'away_elo'],df.at[index,'home_elo'],1 - df.at[index,'actual_score']) # Swap function inputs for away team


        df.at[next_home_index, 'updated'] = df.at[next_home_index, 'updated'] + 1
        df.at[next_away_index, 'updated'] = df.at[next_away_index, 'updated'] + 1

        df.at[index,'updated'] = 3

该代码在前几行中运行良好。但是，即使我看不到这些行与其他行有何不同，也总是遇到相同行的错误。

如果我没有如上所述处理ValueError，则在大约250行之后，我第一次收到错误消息ValueError: can only convert an array of size 1 to a Python scalar。
如果我确实如上所示处理ValueError，则会捕获四个这样的错误，每个错误处理块都会捕获两个错误（否则代码会正常工作），但是代码在此之后将停止更新任何其他强度等级约占所有行的18％，而不会引发任何错误消息。

如果您能帮助我（a）了解导致错误的原因以及（b）如何处理错误的信息，我将不胜感激。

由于这是我在StackOverflow上的第一篇帖子，因此我仍未完全了解该论坛的常见发布惯例。请让我知道我的帖子是否有什么可以改善的地方。

非常感谢您！

Answer 1

仅供参考，

如果将.item应用于numpy数组，则会出现类似的错误。

在这种情况下，您可以使用.tolist()进行解决。

Answer 2

WITH dep AS (SELECT depairport AS airport, count(CASE WHEN a.status = 'Scheduled' AND a.actual_blockoff IS NOT NULL THEN 1 END) AS scheduled, count(CASE WHEN( ( a.actual_blockoff + interval '7' hour ) - ( a.scheduled_depdt + interval '7' hour ) ) * 24 * 60 <= '+000000015 00:00:00.000000000' AND a.actual_blockoff IS NOT NULL THEN 1 END) AS ontime FROM tablea GROUP BY depairport), arr AS (SELECT arrivalairport AS airport, count(CASE WHEN( ( a.actual_blockon + interval '7' hour ) - ( a.scheduled_arrdt + interval '7' hour ) ) * 24 * 60 <= '+000000015 00:00:00.000000000' AND a.actual_blockon IS NOT NULL THEN 1 END) AS arrontime FROM tablea GROUP BY arrivalairport) SELECT dep.airport AS Name, dep.scheduled AS "#Schedule", dep.ontime AS "#OnTimeDeparture", arr.arrontime AS "#ArrivalOntime" FROM dep left join arr -- Or Inner join depending on the expected output. ON ( dep.airport = arr.airport );需要Series中的至少一项才能返回标量。如果：

pd.Series.item

是一个长度为0的序列，则df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])]将引发ValueError。

Python：在pd.DataFrame

2 个答案: