我想创建一个应用程序来检查用户输入的单词是否包含来自单独文本文件的单词/单词(例如input =' teeth',单独的文件包含单词' eet& #39;)无论字符的顺序如何,它都应返回True。
我查看了这个帖子matching all characters in any order in regex,这很酷,因为它使用set()工作。问题是,set()不允许你使用重复的字符(例如,eeet,aaat)。
我想知道如何解决这个问题?
答案 0 :(得分:2)
我会从两个字符串创建一个collections.Counter
对象,对字符进行计数,然后减去dicts,测试结果dict是否为空(这意味着字符串包含具有基数的子字符串)
导入集合
def contains(substring, string):
c1 = collections.Counter(string)
c2 = collections.Counter(substring)
return not(c2-c1)
print(contains("eeh","teeth"))
print(contains("eeh","teth"))
结果:
True
False
请注意,您的示例不代表
>>> "eet" in "teeth"
True
这就是我改变它的原因。
答案 1 :(得分:2)
我知道它不太可能,但如果性能对于非常大的输入真的很重要,你可以避免需要创建第二个Counter
并直接迭代子字符串的字符,允许如果你用完一个给定的角色,就提前终止。
In [26]: def contains2(string, substring):
...: c = Counter(string)
...: for char in substring:
...: if c[char] > 0:
...: c[char] -= 1
...: else:
...: return False
...: return True
...:
In [27]: contains2("teeth", "eeh")
Out[27]: True
In [28]: contains2("teeth", "ehe")
Out[28]: True
In [29]: contains2("teth", "ehe")
Out[29]: False
In [30]: contains2("teth", "eeh")
Out[30]: False
In [31]: def contains(string, substring):
...: c1 = collections.Counter(string)
...: c2 = collections.Counter(substring)
...: return not(c2-c1)
...:
In [32]: contains("teth", "ehe")
Out[32]: False
In [33]: contains("teeth", "ehe")
Out[33]: True
In [34]: contains("teeth", "eeh")
Out[34]: True
In [35]: %timeit contains("teeth", "eeh")
19.6 µs ± 94.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [36]: %timeit contains2("teeth", "eeh")
9.59 µs ± 29.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [37]: %timeit contains("friday is a good day", "ss a")
22.9 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [38]: %timeit contains2("friday is a good day", "ss a")
9.52 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)