Question

如何在Python中迭代defaultdict（list）？有没有更好的方法在Python中使用列表字典？我尝试了正常iter(dict)，但我收到了错误：

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

主要班级：

import para
para.print_doc('./foo/bar/para-lines.txt')

para.pyc：

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

例如./foo/bar/para-lines.txt的内容如下所示：

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

主类的输出应如下所示：

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.

Answer 1

你遇到的问题

for para in iter(doc):

doc是段落的实例，而不是defaultdict。您在__init__方法中使用的默认字典超出范围并丢失。所以你需要做两件事：

将doc方法中创建的__init__保存为实例变量（例如self.doc）。
使Paragraphs本身可迭代（通过添加__iter__方法），或允许它访问创建的doc对象。

Answer 2

您链接的食谱相当陈旧。它是在2001年编写的，之前Python有更多现代工具，如itertools.groupby（在Python2.4中引入，released in late 2003）。以下是使用groupby代码的代码：

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))

Answer 3

问题似乎是你正在迭代你的Paragraphs类，而不是字典。此外，请考虑使用

，而不是迭代键然后访问字典条目

for (key, value) in d.items():

Answer 4

它失败了，因为您没有在Paragraphs类中定义__iter__()，然后尝试调用iter(doc)（其中doc是Paragraphs实例）。

要进行迭代，类必须有__iter__()才能返回迭代器。 Docs here

Answer 5

我无法想到你在这里使用词典的任何理由，更不用说默认词了。列表清单会简单得多。

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)

如何在Python中迭代defaultdict（list）？

5 个答案: