如何在Python中迭代defaultdict(list)?
有没有更好的方法在Python中使用列表字典?
我尝试了正常iter(dict)
,但我收到了错误:
>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "para.py", line 31, in print_doc
for para in iter(doc):
TypeError: iteration over non-sequence
主要班级:
import para
para.print_doc('./foo/bar/para-lines.txt')
para.pyc:
# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Separator here refers to the paragraph seperator,
# the default separator is '\n'.
def __init__(self, filename, separator=None):
# Set separator if passed into object's parameter,
# else set default separator as '\n'
if separator is None:
def separator(line): return line == '\n'
elif not callable(separator):
raise TypeError, "separator argument must be callable"
self.separator = separator
# Reading lines from files into a dictionary of lists
self.doc = defaultdict(list)
paraIndex = 0
with open(filename) as readFile:
for line in readFile:
if line == separator:
paraIndex+=1
else:
self.doc[paraIndex].append(line)
# Prints out populated doc from txtfile
def print_doc(filename):
text = Paragraphs(filename)
for para in iter(text.doc):
for sent in text.doc[para]:
print "Para#%d, Sent#%d: %s" % (
para, text.doc[para].index(sent), sent)
例如./foo/bar/para-lines.txt
的内容如下所示:
This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.
This is the start of next para.
foo boo bar bar
this is the end.
主类的输出应如下所示:
Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.
Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.
答案 0 :(得分:4)
你遇到的问题
for para in iter(doc):
doc
是段落的实例,而不是defaultdict
。您在__init__
方法中使用的默认字典超出范围并丢失。所以你需要做两件事:
将doc
方法中创建的__init__
保存为实例变量(例如self.doc
)。
使Paragraphs
本身可迭代(通过添加__iter__
方法),或允许它访问创建的doc
对象。
答案 1 :(得分:2)
您链接的食谱相当陈旧。它是在2001年编写的,之前Python有更多现代工具,如itertools.groupby(在Python2.4中引入,released in late 2003)。以下是使用groupby
代码的代码:
import itertools
import sys
with open('para-lines.txt', 'r') as f:
paranum = 0
for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
if is_separator:
# we've reached paragraph separator
print
else:
paranum += 1
for n, sentence in enumerate(paragraph, start = 1):
sys.stdout.write(
'Para#{i:d},Sent#{n:d}: {s}'.format(
i = paranum, n = n, s = sentence))
答案 2 :(得分:0)
问题似乎是你正在迭代你的Paragraphs
类,而不是字典。此外,请考虑使用
for (key, value) in d.items():
答案 3 :(得分:0)
它失败了,因为您没有在Paragraphs类中定义__iter__()
,然后尝试调用iter(doc)
(其中doc是Paragraphs实例)。
要进行迭代,类必须有__iter__()
才能返回迭代器。 Docs here
答案 4 :(得分:0)
我无法想到你在这里使用词典的任何理由,更不用说默认词了。列表清单会简单得多。
doc = []
with open(filename) as readFile:
para = []
for line in readFile:
if line == separator:
doc.append(para)
para = []
else:
para.append(line)
doc.append(para)