我有一个体育比赛结果的数据集,我想用每场比赛之前的这段时间的过去表现来注释。
我想到的方式是:
Match
注释在相关时间段内发生的一组匹配项(下面的MatchManager.matchset_within_period
)。Match
)中汇总统计信息来注释每个MatchManager.annotate_with_stats
。我可以使用一个有点复杂的查询(在下面概述)来做到这一点,该查询涉及一个额外的Dataset
模型,该模型我上下移动以获取对整套匹配项的引用,然后,我可以进行过滤和汇总。
这种方法看起来真的很复杂,并且可能对性能不利。对于读者而言,这绝对是很难遵循的(至少是不直观的)。
是否可以直接获得步骤(1)所需的匹配集,而无需额外的模型(例如,在Match
上使用Subquery direclty)?
In [1]
test_matches = Match.objects.filter(...)
Match.objects \
.annotate_with_stats(for_days=300) \
.filter(id__in=test_matches) \
.values('pk', 'home_team_avg_score')
Out[1]
<MatchQuerySet [{'id': 287, 'home_team_avg_score': 91.04166666666667}, {'id': 288, 'home_team_avg_score': 91.21739130434783}, {'id': 289, 'home_team_avg_score': 92.45833333333333}]>
models.py (simplified)
class Team(models.Model):
name = models.CharField(max_length=255, unique=True)
# This model has no semantic meaning - it's purely for the query
class Dataset(models.Model):
name = models.CharField(max_length=255, unique=True)
class Season(models.Model):
dataset = models.ForeignKey(
Dataset, on_delete=models.CASCADE, related_name='seasons',
)
# ...
class Round(models.Model):
season = models.ForeignKey(
Season, on_delete=models.CASCADE, related_name='rounds',
)
# ...
class Match(models.Model):
round = models.ForeignKey(
Round, on_delete=models.CASCADE, related_name='matches',
)
home_team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='home_matches',
)
date = models.DateTimeField()
# ...
objects = MatchManager()
class TeamMatchStats(models.Model):
match = models.ForeignKey(
Match, on_delete=models.CASCADE, related_name='team_stats',
)
team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_stats',
)
score = models.IntegerField()
# ...
managers.py (simplified)
def fm(x):
'''
Helper function for obtaining a self-referential matches set.
'''
if x.startswith('round'):
raise ValueError('Cannot re-traverse upwards')
return f'round__season__dataset__seasons__rounds__matches__{x}'
class MatchQuerySet(models.QuerySet):
def matchset_within_period(self, td):
# filter(date__lt): before this match
# annotate/filter(time_before__lte): within x period
return self \
.filter(**{fm('date__lt'): F('date')}) \
.annotate(
time_before=ExpressionWrapper(
F('date') - F(fm('date')),
output_field=DurationField(),
)
) \
.filter(time_before__lte=td) \
.values('pk')
def annotate_with_stats(self, for_days):
q_home_team = Q(**{fm('team_stats__team'): F('home_team')})
team_avg_params = {
'home_team_avg_score': Avg(
fm('team_stats__score'), filter=q_home_team,
)
} # In reality this is a dict comp getting a number of stats
return self \
.matchset_within_period(timedelta(days=for_days)) \
.annotate(**team_avg_params)
MatchManager = MatchQuerySet.as_manager