在列表上设置操作

时间:2016-03-28 19:05:51

标签: python list set

我的想法是我有一个由4个子字符串组成的字符串'4'。我试图从列表'文件夹'中找到与4个子字符串(子集)匹配的单词。所以'保罗'会匹配,但'保罗'不会。我唯一的问题是'sets'无法处理相同的字符串。下面的代码将打印'aa',但字符串'four'中只有一个a。是否有可用于列表的操作'issubset'?

four = "laup"

four = set(four)

folder = ["paul","joshua","other","asdf","joshua","aa","hello"]

for word in folder:
    wordstrings = set(word)
    if wordstrings.issubset(four):
            print(word)

2 个答案:

答案 0 :(得分:2)

如果你想匹配包括重复在内的相同字符,请使用Counter dict计算每个字中的字符:

four = "laup"
from collections import Counter
four = Counter(four)

folder = ["paul","joshua","other","asdf","joshua","aa","hello"]

for word in folder:
    wordstrings = Counter(word)
    if not wordstrings - four:
            print(word)

如果在A - B之后得到一个空的计数器,则意味着A中的所有字母在A中出现的次数至少与在B中出现的次数相同:

In [14]: Counter("foos") - Counter("foo")
Out[14]: Counter({'s': 1})

In [15]: Counter("foo") - Counter("foos")
Out[15]: Counter()

In [16]: Counter("pauls") - Counter("paul")
Out[16]: Counter({'s': 1})

In [17]: Counter("paul") - Counter("paul")
Out[17]: Counter()

您还可以使用all确保至少字词串中的字符出现在四个应用中,如果没有,则会短路:

for word in folder:
    wordstrings = Counter(word)
    if all(wordstrings[k] - four[k] <= 0 for k in wordstrings):
            print(word)

集合根本不起作用,因为所有元素都是唯一的,因此重复的字符将被计为1。

答案 1 :(得分:0)

据我所知,列表没有像子集这样的功能。集合总是剥离重复项,因为您不需要知道值是重复的,以便知道它存在于集合中。这里的问题是,当您在for循环中迭代时,word ='aa'变为wordstrings = {'a'},这是四个子集。你必须使用套装吗?我只有一个计数器而不是将这些单词转换成集合。

     four = list(four) #keep it as a workable list instead?

     for word in folder:
          n = 0
          for letter in word:
              if four.count(letter) != word.count(letter):
        # iterating through and counting if the letter is not in 
        # both four and word in equal quantities 
                    n += 1
          if n == 0:
              print(word)