我有以下功能,旨在恢复最佳平均成绩。 它需要输入:
scores = [["bob",100],["bob",100],["toto",100],["frank",100]]
如何改进它以便在一段时间内处理大量输入?也就是说如何获得更好的运行时复杂度?
编辑:它应该处理负分和空分。
def maxavg(scores):
avs=[]
namelist=[]
for i in range(0,len(scores)):
name = scores[i][0]
if name not in namelist:
namelist.append(name)
note = scores[i][1]
nbnotes = 1
for j in range(i+1,len(scores)):
if scores[j][0]==name:
nbnotes+=1
note+=scores[j][1]
avs.append(note/nbnotes)
return max(avs)
答案 0 :(得分:2)
它可能比您的代码快,代码行数少
scores = [["bob",100],["bob",90],["toto",70],["frank",100]]
df = pd.DataFrame(scores,columns=['name', 'scores'])
print df.groupby('name').mean().idxmax()
输出:
scores frank
答案 1 :(得分:2)
Without going into thenumpy
array
or pandas
dataframe
shown by @galaxyman, you're missing many basic Python stuff. You need to get acquainted with things like dictionaries
. Here's an example using the defaultdict
that initializes to 0 when assigning to a non-existing key:
from collections import defaultdict
def maxavg(scores);
scoredict = defaultdict(int)
namecount = defaultdict(int)
for name,grade in scores:
scoredict[name] += grade
namecount[name] += 1
retrun max((scoredict[name]/namecount[name] for name in scoredict))
A regular dictionary, mydict = {}
would fail on the first attempt to assign mydict['somename'] += grade
, since +=
assumes an existing key. the defaultdict
construct surrounds such problems with a try
except
block, to make first initialization. I suggest you google all these things. GL. That final line is a generator, though you should check list comprehensions as well.
答案 2 :(得分:1)
如何改进?很高兴你问。主要是使用适当的数据类型,这可以避免循环中的O(N)操作。这样你就可以避免意外地编写二次O(N ^ 2)代码。这里,它意味着从数组/列表移动到字典。
for i in range(0,len(scores))
循环是非常好的Fortran,但我们有机会使用python习语:
for name, score in scores:
if name not in namelist
测试在您的循环中隐藏线性扫描O(N)。通过使用dict,我们可以避免这种情况。此外,测试“这个名字已经存在吗?”可以埋在defaultdict:
total = collections.defaultdict(int)
n = collections.defaultdict(int)
for name, score in scores:
total[name] += score
n[name] += 1
avg = {name, total[name] / n[name]
for name in scores}
return max(avg.values())
答案 3 :(得分:0)
您可以使用cython转换变量类型来改善运行时。 This link是一个很好的介绍。
因为Python是动态类型的,所以每次循环遍历变量时,它必须确定返回什么类型的变量(int,string等等)。使用cython设置变量类型可以显着提高速度。