如何处理大量输入并提高运行时复杂度?

时间:2017-10-15 19:21:26

标签: python time-complexity

我有以下功能,旨在恢复最佳平均成绩。 它需要输入:

scores = [["bob",100],["bob",100],["toto",100],["frank",100]]

如何改进它以便在一段时间内处理大量输入?也就是说如何获得更好的运行时复杂度?

编辑:它应该处理负分和空分。

def maxavg(scores):
    avs=[]
    namelist=[]
    for i in range(0,len(scores)):
        name = scores[i][0]
        if name not in namelist:
            namelist.append(name)
            note = scores[i][1]
            nbnotes = 1
            for j in range(i+1,len(scores)):
                if scores[j][0]==name:
                    nbnotes+=1
                    note+=scores[j][1]
            avs.append(note/nbnotes)
    return max(avs)

4 个答案:

答案 0 :(得分:2)

它可能比您的代码快,代码行数少

scores = [["bob",100],["bob",90],["toto",70],["frank",100]]
df = pd.DataFrame(scores,columns=['name', 'scores'])
print df.groupby('name').mean().idxmax()

输出:

scores    frank

答案 1 :(得分:2)

Without going into thenumpy array or pandas dataframe shown by @galaxyman, you're missing many basic Python stuff. You need to get acquainted with things like dictionaries. Here's an example using the defaultdict that initializes to 0 when assigning to a non-existing key:

from collections import defaultdict
def maxavg(scores);
    scoredict = defaultdict(int)
    namecount = defaultdict(int)
    for name,grade in scores:
         scoredict[name] += grade
         namecount[name] += 1
    retrun max((scoredict[name]/namecount[name] for name in scoredict))

A regular dictionary, mydict = {} would fail on the first attempt to assign mydict['somename'] += grade, since += assumes an existing key. the defaultdict construct surrounds such problems with a try except block, to make first initialization. I suggest you google all these things. GL. That final line is a generator, though you should check list comprehensions as well.

答案 2 :(得分:1)

如何改进?很高兴你问。主要是使用适当的数据类型,这可以避免循环中的O(N)操作。这样你就可以避免意外地编写二次O(N ^ 2)代码。这里,它意味着从数组/列表移动到字典。

for i in range(0,len(scores))循环是非常好的Fortran,但我们有机会使用python习语:

for name, score in scores:

if name not in namelist测试在您的循环中隐藏线性扫描O(N)。通过使用dict,我们可以避免这种情况。此外,测试“这个名字已经存在吗?”可以埋在defaultdict

total = collections.defaultdict(int)
n = collections.defaultdict(int)
for name, score in scores:
    total[name] += score
    n[name] += 1
avg = {name, total[name] / n[name]
       for name in scores}
return max(avg.values())

答案 3 :(得分:0)

您可以使用cython转换变量类型来改善运行时。 This link是一个很好的介绍。

因为Python是动态类型的,所以每次循环遍历变量时,它必须确定返回什么类型的变量(int,string等等)。使用cython设置变量类型可以显着提高速度。