我想遍历DataFrame的行,以计算许多运动队的力量等级。
DataFrame列'home_elo'
和'away_elo'
包含所涉及球队的赛前实力等级(ELO得分),并在赛后下一场主场/客场比赛的行中进行更新(每个团队在任何时间点(对于主场比赛和客场比赛)都有两个强度等级,update_elo(a,b,c)
得到了回报。
各个代码段如下所示:
for index in df.index:
counter = counter + 1
# Calculation of post-match ELO scores for home and away teams
if df.at[index,'updated'] == 2: # Update next match ELO scores if not yet updated but pre-match ELO scores available
try:
all_home_fixtures = df.date_rank[df['localteam_id'] == df.at[index,'localteam_id']]
next_home_fixture = all_home_fixtures[all_home_fixtures > df.at[index,'date_rank']].min()
next_home_index = df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])].index.item()
except ValueError:
print('ERROR 1 at' + str(index))
df.at[index,'updated'] = 4
try:
all_away_fixtures = df.date_rank[df['visitorteam_id'] == df.at[index,'visitorteam_id']]
next_away_fixture = all_away_fixtures[all_away_fixtures > df.at[index,'date_rank']].min()
next_away_index = df[(df['date_rank'] == next_away_fixture) & (df['visitorteam_id'] == df.at[index,'visitorteam_id'])].index.item()
except ValueError:
print('ERROR 2 at' + str(index))
df.at[index,'updated'] = 4
# print('Current: ' + str(df.at[index,'fixture_id']) + '; Followed by: ' + str(next_home_fixture))
# print('Current date rank: ' + str(df.at[index,'date']) + ' ' + str(df.at[index,'date_rank']) + '; Next home date rank: ' + str(df.at[next_home_index,'date_rank']) + '; Next away date rank: ' + str(df.at[next_away_index,'date_rank']))
df.at[next_home_index, 'home_elo'] = update_elo(df.at[index,'home_elo'],df.at[index,'away_elo'],df.at[index,'actual_score'])
df.at[next_away_index, 'away_elo'] = update_elo(df.at[index,'away_elo'],df.at[index,'home_elo'],1 - df.at[index,'actual_score']) # Swap function inputs for away team
df.at[next_home_index, 'updated'] = df.at[next_home_index, 'updated'] + 1
df.at[next_away_index, 'updated'] = df.at[next_away_index, 'updated'] + 1
df.at[index,'updated'] = 3
该代码在前几行中运行良好。但是,即使我看不到这些行与其他行有何不同,也总是遇到相同行的错误。
ValueError
,则在大约250行之后,我第一次收到错误消息ValueError: can only convert an array of size 1 to a Python scalar
。ValueError
,则会捕获四个这样的错误,每个错误处理块都会捕获两个错误(否则代码会正常工作),但是代码在此之后将停止更新任何其他强度等级约占所有行的18%,而不会引发任何错误消息。如果您能帮助我(a)了解导致错误的原因以及(b)如何处理错误的信息,我将不胜感激。
由于这是我在StackOverflow上的第一篇帖子,因此我仍未完全了解该论坛的常见发布惯例。请让我知道我的帖子是否有什么可以改善的地方。
非常感谢您!
答案 0 :(得分:3)
仅供参考,
如果将.item
应用于numpy数组,则会出现类似的错误。
在这种情况下,您可以使用.tolist()
进行解决。
答案 1 :(得分:0)
WITH dep
AS (SELECT depairport AS airport,
count(CASE
WHEN a.status = 'Scheduled'
AND a.actual_blockoff IS NOT NULL THEN 1
END) AS scheduled,
count(CASE
WHEN( ( a.actual_blockoff + interval '7' hour ) - (
a.scheduled_depdt + interval '7' hour ) ) *
24 *
60
<=
'+000000015 00:00:00.000000000'
AND a.actual_blockoff IS NOT NULL THEN 1
END) AS ontime
FROM tablea
GROUP BY depairport),
arr
AS (SELECT arrivalairport AS airport,
count(CASE
WHEN( ( a.actual_blockon + interval '7' hour ) - (
a.scheduled_arrdt + interval '7' hour ) ) *
24 *
60
<=
'+000000015 00:00:00.000000000'
AND a.actual_blockon IS NOT NULL THEN 1
END) AS arrontime
FROM tablea
GROUP BY arrivalairport)
SELECT dep.airport AS Name,
dep.scheduled AS "#Schedule",
dep.ontime AS "#OnTimeDeparture",
arr.arrontime AS "#ArrivalOntime"
FROM dep
left join arr -- Or Inner join depending on the expected output.
ON ( dep.airport = arr.airport );
需要Series中的至少一项才能返回标量。如果:
pd.Series.item
是一个长度为0的序列,则df[(df['date_rank'] == next_home_fixture) & (df['localteam_id'] == df.at[index,'localteam_id'])]
将引发ValueError。