可能重复:
What does python intern do, and when should it be used?
我正在使用python中的程序,该程序必须在数组上关联数百万个字符串对象。我发现如果它们都来自相同的带引号的字符串,则每个附加的“字符串”只是对第一个主字符串的引用。但是,如果从文件中读取字符串,并且字符串全部相等,则每个字符串仍然需要新的内存分配。
也就是说,这需要大约14美分的存储空间:
a = ["foo" for a in range(0,1000000)]
虽然这需要超过65美分的存储空间:
a = ["foo".replace("o","1") for a in range(0,1000000)]
现在我可以用这个来减少占用空间:
s = {"f11":"f11"}
a = [s["foo".replace("o","1")] for a in range(0,1000000)]
但这看起来很傻。有更简单的方法吗?
答案 0 :(得分:13)
只需执行一个intern()
,它告诉Python存储并从内存中获取字符串:
a = [intern("foo".replace("o","1")) for a in range(0,1000000)]
这也导致大约18MB,与第一个例子相同。
如果使用python3,请注意下面的注释。感谢@Abe Karplus
答案 1 :(得分:0)
strs=["this is string1","this is string2","this is string1","this is string2",
"this is string3","this is string4","this is string5","this is string1",
"this is string5"]
new_strs=[]
for x in strs:
if x in new_strs:
new_strs.append(new_strs[new_strs.index(x)]) #find the index of the string
#and instead of appending the
#string itself, append it's reference.
else:
new_strs.append(x)
print [id(y) for y in new_strs]
相同的字符串现在具有相同的id()
<强>输出:强>
[18632400, 18632160, 18632400, 18632160, 18651400, 18651440, 18651360, 18632400, 18651360]
答案 2 :(得分:-1)
保持所见字符串的字典应该有效
new_strs = []
str_record = {}
for x in strs:
if x not in str_record:
str_record[x] = x
new_strs.append(str_record[x])
(未测试)。