Question

我要做的是将单词对象（由扫描的单词，其字母顺序排列的版本及其长度组成）按长度排序到列表中。所以，我初始化了一个长度为0的列表，我正在扩展它，因为我正在浏览我的输入文件。我想要做的是在列表中有一个列表，以便我的结果[5]包含一个长度为5的列表。我该怎么做？

我首先按如下方式初始化我的列表：

results = []

然后我逐行扫描输入文件创建临时对象，我希望将它们放入适当的列表中：

try:    #check if there exists an array for that length
    results[lineLength]
except IndexError:  #if it doesn't, create it up to that length
    # Grow the list so that the new highest index is len(word)
    difference = len(results) - lineLength
    results.extend([] for _ in range(difference))
finally:
    results[lineLength].append(tempWordObject)

我觉得至少需要编辑以下其中一项

（1）我初始化结果列表的方式（2）我将对象附加到列表的方式（3）我扩展名单的方式（虽然我认为那部分是正确的）

我正在使用Python 3.4。

编辑：

from sys import argv
main, filename = argv
file = open(filename)
for line in file:           #go through the file
    if line == '\n':        #if the line is empty (aka end of file), exit loop
        break
    lineLength = (len(line)-1)  #get the line length 
    line= line.strip('\r\n')

    if lineLength > maxL:       #keeps track of length of longest word encountered
        maxL = lineLength

    #note: I've written a mergesort algorithm in a separate area in the code and it works 
    tempAZ = mergesort(line)    #mergesort the word into alphabetical order
    tempAZ = ''.join(tempAZ)    #merges the chars back together to form a string

    tempWordObject = word(line,tempAZ,lineLength) #creates a new word object

    try:    #check if there exists an array for that length
        results[lineLength]
    except IndexError:  #if it doesn't, create it up to that length
        # Grow the list so that the new highest index is len(word)
        difference = len(results) - lineLength
        results.extend([] for _ in range(difference))
        print("lineLength: ", lineLength, "    difference:", difference)
    finally:
        results[lineLength].append(tempWordObject)

编辑：

这是我的单词类：

class word(object): #object class

    def __init__(self, originalWord=None, azWord=None, wLength=None):
        self.originalWord = originalWord
        self.azWord = azWord
        self.wLength = wLength

编辑：

以下是对我要实现的内容的澄清：当我在列表（长度未知）中迭代（也是未知长度）时，我正在创建包含单词的单词对象，其字母顺序排列版本及其长度（例如dog，dgo，3）。当我浏览该列表时，我希望所有对象都进入另一个列表中的列表（结果[]），由单词的长度索引。如果results []不包含这样的索引（例如3），我想扩展结果[]并在结果[3]中开始一个包含单词object（dog，dgo，3）的列表。最后，结果[]应包含按其长度索引的单词列表。

Answer 1

您可以使用字典而不是列表：

d = {}

这里的密钥是长度，值是一个单词列表：

if linelength not in d:
    d[linelength] = []
d[linelength].append(tempWordObject)

您可以使用d = collections.defaultdict(list)进一步简化。

Answer 2

你的差异是消极的。你需要减去相反的方向。您还需要添加一个额外的，因为索引从0开始

difference = lineLength - len(results) + 1

事实证明，通常更容易使用defaultdict来实现此目标

例如：

from collections import defaultdict
D = defaultdict(list)
for tempWordObject in the_file:
    D[len(tempWordObject)].append(tempWordObject)

Answer 3

如果你开始使用一个列表（可能不是最好的选择），我认为创建一个尽可能大的列表将更容易，更清晰。也就是说，如果最长的单词长度为5个字符，则首先创建此列表：

output = [None, [], [], [], [], []]

这样做的好处是，您不必担心捕捉异常，但它确实需要您在开始之前知道所有单词。既然你创建了一个对象类来存储所有这些，我假设你实际上存储了所有这些，所以它不应该是一个问题。

您始终需要None，因此索引会匹配。完成后，您可以遍历单词列表，只需将其附加到适当的列表中即可。

for word in wordlist:
    output[len(word)].append(word)

特别针对您，我要做的不是存储tempWordObject，而是在处理文件时列出这些对象的列表（wordObjList）。完成文件后，关闭句柄，然后继续执行其余的处理。

生成模板列表：

output = [None]
for i in range(maxLen):
    output.append([])

填写word obejcts

列表中的列表

for wordObj in wordObjList:
    output[wordObj.wLength].append(wordObj.originalWord)

其他一些注意事项：

您无需处理命中文件的末尾。当Python到达for循环中的文件末尾时，它将自动停止迭代
始终确保关闭文件。您可以使用with构造来执行此操作（with open("file.txt", 'r') as f: for line in f:）

Answer 4

关于你的问题的三个注释。

嵌套列表初始化

你在问题标题中提到它，尽管最后你可能不需要它。一种简单的方法是使用两个嵌套的list comprehensions：

import pprint

m, n = 3, 4  # 2D: 3 rows, 4 columns
lol = [[(j, i) for i in range(n)] for j in range(m)]

pprint.pprint(lol)
# [[(0, 0), (0, 1), (0, 2), (0, 3)],
#  [(1, 0), (1, 1), (1, 2), (1, 3)],
#  [(2, 0), (2, 1), (2, 2), (2, 3)]]

使用一些默认数据结构

正如其他人指出的那样，你可以使用字典。特别是，collections.defaultdict将为您提供按需初始化：

import collections

dd = collections.defaultdict(list)

for value in range(10):
    dd[value % 3].append(value)

pprint.pprint(dd)
# defaultdict(<type 'list'>, {0: [0, 3, 6, 9], 1: [1, 4, 7], 2: [2, 5, 8]})

比较自定义对象

内置的sorted函数接受一个关键字参数key，可用于比较自定义对象，它们本身不提供排序挂钩：

import operator

class Thing:
    def __init__(self, word):
        self.word = word
        self.length = len(word)

    def __repr__(self):
        return '<Word %s>' % self.word

things = [Thing('the'), Thing('me'), Thing('them'), Thing('anybody')]
print(sorted(things, key=lambda obj: obj.length))
# [<Word me>, <Word the>, <Word them>, <Word anybody>]

Answer 5

您拒绝接受建议将您的对象存储在词典中的答案。但是，您真正的问题是您希望将包含扫描图像的6密耳单词放入内存中。使用索引（或某种简单引用）并将它们跟踪到您的结构中，然后根据它们查找数据。使用迭代器来检索所需的信息。

如何在Python中初始化和填充列表列表？

5 个答案: