Question

我有兴趣找到最快的方法来遍历列表列表并替换最里面列表中的字符。我正在用Python生成CSV文件中的列表列表。

Bing Ads API向我发送了一份巨型报告，但任何百分比都表示为“20.00％”而不是“20.00”。这意味着我不能将每行插入到我的数据库中，因为“20.00％”不会转换为SQL Server上的数字。

到目前为止，我的解决方案是在列表理解中使用列表理解。我写了一个小脚本来测试它运行的速度与刚刚获取列表相比有多快（它大约是运行时的2倍），但我很想知道是否有更快的方法。

注意：报告中的每条记录都有一个比率，因此是一个百分比。所以每一个记录必须访问一次，每个速率必须访问一次（这是2倍减速的原因吗？）

无论如何，随着这些报告的规模不断扩大，我会喜欢更快的解决方案！

import time
import csv

def getRecords1():
   with open('report.csv', 'rU',encoding='utf-8-sig') as records:
       reader = csv.reader(records)
       while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
           next(reader)
       recordList = list(reader)
   return recordList

def getRecords2():
   with open('report.csv', 'rU',encoding='utf-8-sig') as records:
       reader = csv.reader(records)
       while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
           next(reader)
       recordList = list(reader)
   data = [[field.replace('%', '') for field in record] for record in recordList]
   return recordList

def getRecords3():
    data = []
    with open('c:\\Users\\sflynn\\Documents\\Google API Project\\Bing\\uploadBing\\reports\\report.csv', 'rU',encoding='utf-8-sig') as records:
        reader = csv.reader(records)
        while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
            next(reader)
        for row in reader:
            row[10] = row[10].replace('%','') 
            data+=[row]
    return data

def main():
    t0=time.time()
    for i in range(2000):
        getRecords1()
    t1=time.time()
    print("Get records normally takes " +str(t1-t0))

    t0=time.time()
    for i in range(2000):
        getRecords2()
    t1=time.time()
    print("Using nested list comprehension takes " +str(t1-t0))

    t0=time.time()
    for i in range(2000):
        getRecords3()
    t1=time.time()
    print("Modifying row as it's read takes " +str(t1-t0))



main()

编辑：我添加了第三个函数getRecords3（），这是我见过的最快的实现。运行程序的输出如下：

获取记录通常需要30.61197066307068

使用嵌套列表理解需要60.81756520271301

在读取时修改行需要43.761850357055664

这意味着我们已经将它从2倍慢的算法降低到大约1.5倍的速度。谢谢大家！

Answer 1

您可以检查就地内部列表修改是否比使用列表理解创建新的列表列表更快。

所以，像

for field in record: for index in range(len(field)): range[index] = range[index].replace('%', '')

由于字符串是不可变的，我们无法就地修改字符串。

替换列表列表中字符的最快方法

1 个答案: