我想为给定的字符串或给定的列表生成哈希表。哈希表将元素视为key
,将显示时间视为value
。例如:
s = 'ababcd'
s = ['a', 'b', 'a', 'b', 'c', 'd']
dict_I_want = {'a':2,'b':2, 'c':1, 'd':1}
# method 1
from collections import Counter
s = 'ababcd'
hash_table1 = Counter(s)
# method 2
s = 'ababdc'
hash_table2 = dict()
for i in s:
if hash_table2.get(i) == None:
hash_table2[i] = 1
else:
hash_table2[i] += 1
hash_table1 == hash_table2
是
通常,我使用上面的2种方法。一种来自标准库,但在某些代码练习网站中是不允许的。另一个是从头开始编写的,但我认为它太长了。如果我使用dict理解,我会提出另外两种方法:
{i:s.count(i) for i in set(s)}
{i:s.count(i) for i in s}
我想知道是否还有其他方法可以清楚或有效地从列表字符串初始化哈希表?
from collections import Counter
import random,string,numpy,perfplot
def from_set(s):
return {i:s.count(i) for i in set(s)}
def from_string(s):
return {i:s.count(i) for i in s}
def handy(s):
hash_table2 = dict()
for i in s:
if hash_table2.get(i) == None:
hash_table2[i] = 1
else:
hash_table2[i] += 1
return hash_table2
def counter(s):
return Counter(s)
perfplot.show(
setup=lambda n: ''.join(random.choices(string.ascii_uppercase + string.digits, k=n)), # or simply setup=numpy.random.rand
kernels=[from_set,from_string,handy,counter],
labels=['set','string','handy','counter'],
n_range=[2 ** k for k in range(17)],
xlabel="len(string)",
equality_check= None
# More optional arguments with their default values:
# title=None,
# logx="auto", # set to True or False to force scaling
# logy="auto",
# equality_check=numpy.allclose, # set to None to disable "correctness" assertion
# automatic_order=True,
# colors=None,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
答案 0 :(得分:2)
最好的方法是使用内置计数器,否则,您可以使用defualtdict,这与您的第二次尝试非常相似
from collections import defualtdict
d = defualtdict(int) # this makes every value 0 by defualt
for letter in string:
d[letter] +=1
答案 1 :(得分:1)
我通常使用 Counter 或 defaultdict 来创建出现频率。
令人惊讶的是,发帖人的from_set方法在大多数情况下都表现出色。
观察
算法
from collections import Counter
from collections import defaultdict
import random,string,numpy,perfplot
def from_set(s):
" Use builtin count function for each item in set "
return {i:s.count(i) for i in set(s)}
def counter(s):
" Uses counter module "
return Counter(s)
def normal_dic(s):
" Update dictionary by checking if item in it or not "
d = {}
for i in s:
if i in d:
d[i] += 1
else:
d[i] = 1
return d
def setdefault_dic(s):
" Use setdefault to preset unknown keys "
d = {}
for i in s:
d.setdefault(i, 0)
d[i] += 1
return d
def default_dic(s):
" Used defaultdict from collections module "
d = defaultdict(int)
for i in s:
d[i] += 1
return d
def try_dic(s):
" Use try/except to check if item in dictionary "
d = {}
for i in s:
try:
d[i] += 1
except:
d[i] = 1
return d
测试代码
out = perfplot.bench(
setup=lambda n: ''.join(random.choices(string.ascii_uppercase + string.digits, k=n)), # or simply setup=numpy.random.rand
kernels=[from_set, counter, setdefault_dic, default_dic, try_dic],
labels=['set', 'counter', 'setdefault', 'defaultdict', 'try_dic'],
n_range=[2 ** k for k in range(17)],
xlabel="len(string)",
equality_check= None
# More optional arguments with their default values:
# title=None,
# logx="auto", # set to True or False to force scaling
# logy="auto",
# equality_check=numpy.allclose, # set to None to disable "correctness" assertion
# automatic_order=True,
# colors=None,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
out.show()
#out.save("perf.png")
out
图表
绝对值
图中的from_set标签'set'。在下面的相对图中,相对性能要比此绝对图容易。
相对值
图中的from_set标签'set'。
from_set方法是水平线。对于更大的值,包括Counter和defaultdict在内的所有其他方法都在上面(耗时)。
表格
实际时间
n setdefault try_dic defaultdict counter from_set
1.0 799.0 899.0 1299.0 6099.0 1399.0
2.0 1099.0 1199.0 1599.0 6299.0 1699.0
4.0 1699.0 1699.0 2199.0 6299.0 2399.0
8.0 3199.0 3099.0 3499.0 6899.0 3699.0
16.0 6099.0 5499.0 5899.0 7899.0 5900.0
32.0 10899.0 9299.0 9899.0 8999.0 10299.0
64.0 20799.0 15599.0 15999.0 11899.0 15099.0
128.0 38499.0 25499.0 25899.0 16599.0 21899.0
256.0 73100.0 44099.0 42700.0 26299.0 30299.0
512.0 137999.0 77099.0 72699.0 43199.0 46699.0
1024.0 286599.0 154500.0 144099.0 85700.0 79699.0
2048.0 549700.0 289999.0 266799.0 157499.0 145699.0
4096.0 1103899.0 577399.0 535499.0 309399.0 278999.0
8192.0 2200099.0 1151500.0 1051799.0 606999.0 542499.0
16384.0 4658199.0 2534399.0 2295300.0 1414199.0 1087799.0
32768.0 9509200.0 5270200.0 4838000.0 3066600.0 2177200.0
65536.0 19539500.0 10806300.0 9942100.0 6503299.0 4337599.0