Question

我需要在python中操作一个字符串，因为我正在创建一个字符串中的字符列表，因为python字符串是不可变的：

str = 'abc'
list(str)

问题是字符串可能包含多达一百万个字符，我不确定创建列表是否会降低代码速度。

上述任务的复杂性是什么？还有更好的替代方法来操纵字符串吗？

Answer 1

我们应该忘记效率很低，大约97％的时间说：过早优化是所有邪恶的根源。然而，我们不应该放弃那个关键的3％的机会。一个好的程序员不会因为这样的推理而自满，他会明智地仔细研究关键代码;但只有在识别出代码之后。对程序的哪些部分真正重要做出先验判断通常是错误的，因为使用测量工具的程序员的普遍经验是他们的直观猜测失败。 --Donald Knuth（强调我的）

换句话说，除非你已经对你的代码进行了分析并且它的速度很慢因为你将你的字符串转换成一个列表，所以不要担心它 - 可能会有更大的收益别处。

Answer 2

如果我理解正确你需要从文件中读取字符串，修改它然后写回文件？如果这样大多数内存有效的方法将是使用mmap模块而你不需要建立清单。以下是模块官方文档中的示例：

import mmap

# write a simple example file
with open("hello.txt", "wb") as f:
    f.write(b"Hello Python!\n")

with open("hello.txt", "r+b") as f:
    # memory-map the file, size 0 means whole file
    mm = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print(mm.readline())  # prints b"Hello Python!\n"
    # read content via slice notation
    print(mm[:5])  # prints b"Hello"
    # update content using slice notation;
    # note that new content must have same size
    mm[6:] = b" world!\n"
    # ... and read again using standard file methods
    mm.seek(0)
    print(mm.readline())  # prints b"Hello  world!\n"
    # close the map
    mm.close()

Answer 3

我得到了这样的结果：

~ $ cat test.py
#!/usr/bin/python2.7
import time
import random
length = len( str(random.random()) )
longString = ""
for x in range(1000000 / length):
  longString += str( random.random() )

a = time.time()

li = list(longString)

b = time.time()

print "Time was: " + str(b - a) + " seconds"
print "Length of list" , len(li)
print "length of string " , len(longString)
print "Sample of list: " , li[:100]

~ $ ./test.py
Time was: 0.0284309387207 seconds
Length of list 999863
length of string  999863
Sample of list:  ['0', '.', '0', '5', '3', '2', '0', '9', '3', '0' ....actually longer

从字符串构建列表的成本

3 个答案: