Question

我需要从文本文件中创建一个单词列表。该列表将用于hangman代码，需要从列表中排除以下内容：

重复的字词
包含少于5个字母的字词
包含'xx'作为子字符串的单词
包含大写字母的字词

然后需要将单词列表输出到文件中，以便每个单词都出现在它自己的行上。该程序还需要输出最终列表中的单词数。

这就是我所拥有的，但它无法正常工作。

def MakeWordList():
    infile=open(('possible.rtf'),'r')
    whole = infile.readlines()
    infile.close()

    L=[]
    for line in whole:
        word= line.split(' ')
        if word not in L:
            L.append(word)
            if len(word) in range(5,100):
                L.append(word)
                if not word.endswith('xx'):
                    L.append(word)
                    if word == word.lower():
                        L.append(word)
    print L

MakeWordList()

Answer 1

您可以使用此代码多次附加该字词，
你实际上根本没有过滤掉这些单词，只是根据他们通过的if次数添加不同数量的时间。

您应该结合所有if＆＃39; s

if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
    L.append(word)

或者，如果您希望它更具可读性，您可以拆分它们：

    if word not in L and len(word) >= 5:
        if not 'xx' in word and word.islower():
            L.append(word)

但不要在每一个之后追加。

Answer 2

考虑一下：在嵌套的if语句中，列表中尚未出现的任何单词都会在第一行显示。然后，如果它是5个或更多字符，它将再次添加（我打赌），再次添加等等。您需要在if语句中重新考虑您的逻辑。

Answer 3

改进代码：

def MakeWordList():
    with open('possible.rtf','r') as f:
        data = f.read()
    return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])

set(_iterable_)返回一个没有重复项的set-type对象（所有set项必须是唯一的）。 [word for word...]是一种列表理解，它是创建简单列表的简短方法。您可以迭代'数据'中的每个单词（这假设每个单词都在一个单独的行上）。 if len(word) >= 5 and word.islower() and not 'xx' in word完成最后三个要求（必须超过5个字母，只有小写字母，不能包含'xx'）。

文本文件中的单词列表

3 个答案: