如何提高列表中字符串的比较速度(数百万条数据)?

时间:2018-01-16 13:32:43

标签: python python-3.x performance

将城市和排序与文件进行比较的代码

for city in uniqueCity:
    file = open(city+".txt","a+")
    for data in salesData:
        if data[2] == city:
            file.write(",".join(data).replace(","," "))
            file.write("\n")
    file.close()

1 个答案:

答案 0 :(得分:1)

该功能很慢,因为算法很慢。

正如所写,对于每个city(循环len(uniqueCity)次),它必须遍历所有salesDatalen(salesData)次),因此总比较数执行的是len(uniqueCity) * len(salesData)。在这种情况下(我猜是citystr),你可以做得更好,因为字符串是可以删除的。

groupedSalesData = {city:[] for city in uniqueCity}

for data in salesData:
    city = data[2]
    if city in groupedSalesData:
        groupedSalesData[city].append(data)

for city, dataEntries in groupedSalesData.items():
    file = open(city + ".txt", "a+")
    for data in dataEntries:
        file.write(",".join(data).replace(","," "))
        file.write("\n")
    file.close()

正如您所看到的,此算法的复杂性仅为len(uniqueCity) + len(salesData)(假设data需要O(1)次复制,并且因为Python dict上的操作应为{ {1}}),这要好得多。