Question

我想找出使用音符名称可以组成哪些单词。

这个问题非常相似：Python code that will find words made out of specific letters. Any subset of the letters could be used 但是我的字母中也包含“ fis”，“ cis”等。

letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]

我的单词列表很长，每个列表一个单词，想使用

with open(...) as f:
for line in f:
    if

检查每个单词是否是该“语言”的一部分，然后将其保存到另一个文件中。

我的问题在于如何改变

>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False

因此它也与“ fis”，“ cis”等匹配。

例如“ fish”是匹配项，但“ ifsh”不是匹配项。

编辑： tk窗口打开以选择文件的解决方案：

import re
from tkinter import filedialog as fd

m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()


with open(filename) as f:
    for line in f:
        if m.match(str(line).lower()) is not None:
            matches.append(line[:-1])


print(matches)

Answer 1

此功能有效，它不使用任何外部库：

def func(word, letters):
    for l in sorted(letters, key=lambda x: x.length, reverse=True):
        word = word.replace(l, "")
    return not s

它起作用是因为if s==""，然后它已分解为您的字母。

更新：

似乎我的解释不清楚。 WORD.replace(LETTER, "")不会将WORD中的注释/字母替换为空白，这是一个示例：

func("banana", {'na'})

它将用{（'na'）代替"banana"中的每个''

此后的结果是"ba"，不是音符

not ""的意思是True，而not "ba"是错误的，这是语法糖。

这是另一个示例：

func("banana", {'na', 'chicken', 'b', 'ba'})

它将用{（'chicken'）代替"banana"中的每个''

此后的结果是"banana"

它将用{（'ba'）代替"banana"中的每个''

此后的结果是"nana"

它将用{（'na'）代替"nana"中的每个''

此后的结果是""

它将用{（'b'）代替""中的每个''

此后的结果是""

not ""是True ==>欢呼雀跃！

注释：长度为sorted的原因是因为否则，第二个示例将无法正常工作。删除“ b”后的结果将是“ a”，不能在注释中分解。

Answer 2

我相信^(fis|cis|dis|[abcfhg])+$会做的。

对这里发生的事情有一些解构：

|类似于OR或
[...]表示“括号内的任何符号”
^和$分别代表行的开头和结尾
+代表“ 1个或更多时间”
( ... )代表分组，需要应用+ / * / {}修饰符。如果不分组，则这些修饰符会应用于最接近的左表达式

总的来说，这是“整个字符串是fis / cis / dis的一个或多个重复或abcfhg的重复”

Answer 3

您可以计算单词中所有单元的字母数（音符名称），并将该数字与单词的长度进行比较。

from collections import Counter

units = {"c","d","e","f","g","a","h", "fis","cis","dis"}

def func(word, units=units):
    letters_count = Counter()
    for unit in units:
        num_of_units = word.count(unit)
        letters_count[unit] += num_of_units * len(unit) 
        if len(unit) == 1:
            continue
        # if the unit consists of more than 1 letter (e.g. dis)
        # check if these letters are in one letter units
        # if yes, substruct the number of repeating letters
        for letter in unit:
            if letter in units:
                letters_count[letter] -= num_of_units
    return len(word) == sum(letters_count.values())

print(func('disc'))
print(func('disco'))    
# True
# False

查找具有多个字符字母的某个字母中的所有单词

3 个答案:

更新：