我的想法是我有一个由4个子字符串组成的字符串'4'。我试图从列表'文件夹'中找到与4个子字符串(子集)匹配的单词。所以'保罗'会匹配,但'保罗'不会。我唯一的问题是'sets'无法处理相同的字符串。下面的代码将打印'aa',但字符串'four'中只有一个a。是否有可用于列表的操作'issubset'?
four = "laup"
four = set(four)
folder = ["paul","joshua","other","asdf","joshua","aa","hello"]
for word in folder:
wordstrings = set(word)
if wordstrings.issubset(four):
print(word)
答案 0 :(得分:2)
如果你想匹配包括重复在内的相同字符,请使用Counter dict计算每个字中的字符:
four = "laup"
from collections import Counter
four = Counter(four)
folder = ["paul","joshua","other","asdf","joshua","aa","hello"]
for word in folder:
wordstrings = Counter(word)
if not wordstrings - four:
print(word)
如果在A - B之后得到一个空的计数器,则意味着A中的所有字母在A中出现的次数至少与在B中出现的次数相同:
In [14]: Counter("foos") - Counter("foo")
Out[14]: Counter({'s': 1})
In [15]: Counter("foo") - Counter("foos")
Out[15]: Counter()
In [16]: Counter("pauls") - Counter("paul")
Out[16]: Counter({'s': 1})
In [17]: Counter("paul") - Counter("paul")
Out[17]: Counter()
您还可以使用all
确保至少字词串中的字符出现在四个应用中,如果没有,则会短路:
for word in folder:
wordstrings = Counter(word)
if all(wordstrings[k] - four[k] <= 0 for k in wordstrings):
print(word)
集合根本不起作用,因为所有元素都是唯一的,因此重复的字符将被计为1。
答案 1 :(得分:0)
据我所知,列表没有像子集这样的功能。集合总是剥离重复项,因为您不需要知道值是重复的,以便知道它存在于集合中。这里的问题是,当您在for循环中迭代时,word ='aa'变为wordstrings = {'a'},这是四个子集。你必须使用套装吗?我只有一个计数器而不是将这些单词转换成集合。
four = list(four) #keep it as a workable list instead?
for word in folder:
n = 0
for letter in word:
if four.count(letter) != word.count(letter):
# iterating through and counting if the letter is not in
# both four and word in equal quantities
n += 1
if n == 0:
print(word)