我正在尝试阅读一段文章并从文章中删除字典。键是单词,值是行号。以下是我的代码:
import string
def build_word_index():
input_file=(input('file name: '))
input_file_open=open(input_file,'r')
word_map = {}
line_no = 0
w=[]
for line in input_file_open:
word_lst = line.strip().split()
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
w.append(word_lst)
for word in w[line_no]:
if word!="":
word_map[word]=line_no
line_no+=1
print(word_map)
index_lst = sorted(list(word_map.items()))
print(index_lst)
for word, line_set in index_lst:
line_lst = sorted(list(line_set))
line_str = str( line_lst[0] )
for line_no in line_lst[1:]:
line_str += ", {}".format( line_no )
print("{:14s}:".format(word), line_str )
input_file_open.close()
build_word_index()
我得到的错误是: ERROR:
Traceback (most recent call last):
File "C:/Users/Dasinator/Documents/Books IX/Python Examples/textbook examples/lab10/lab10d.py", line 39, in <module>
build_word_index()
File "C:/Users/Dasinator/Documents/Books IX/Python Examples/textbook examples/lab10/lab10d.py", line 29, in build_word_index
line_lst = sorted(list(line_set))
TypeError: 'int' object is not iterable
我想知道,如果有人可以查看我的代码并给我一些关于修复此错误的提示。感谢
答案 0 :(得分:0)
您的列表index_lst
是对dict的items
方法调用的产物,该方法会为您提供包含其键的list
个tuple
和价值观。
>>> d = {'a': 1, 'b': 2}
>>> d.items()
dict_items([('b', 2), ('a', 1)])
当你按照你的方式迭代它时,你的第一个标识符命名你当前的密钥,第二个标识符命名循环的当前值:
>>> for a, b in d.items():
... print("a: {}, b: {}".format(a, b))
...
a: b, b: 2
a: a, b: 1
>>> # Notice the keys are unsorted!
循环的下一行,您尝试将第二个标识符line_set
传递给list
构造函数,该构造函数从支持迭代的任何内容中生成列表。
line_lst = sorted(list(line_set))
# Hint: this is referenced in your error message
但是line_lst
不是可迭代的对象!它只是一个普通的整数(int
),因此Python放弃了:
TypeError: 'int' object is not iterable
答案 1 :(得分:0)
据我所知,你想要每个单词的列行,而不仅仅是你遇到单词的最后一行。如果是这样,word_map
应该是从单词到行号列表的映射,而不仅仅是单个数字。因此,为单词添加行号的行现在为word_map[word]+=[line_no]
。使用defaultdict
代替简单字典以避免编写if word not in word_map: word_map[word] = []
部分。
这是一个工作版本:
import string, collections
def build_word_index():
input_file=(input('file name: '))
input_file_open=open(input_file,'r')
word_map = collections.defaultdict (list)
line_no = 0
w=[]
for line in input_file_open:
word_lst = line.strip().split()
word_lst = [w.lower().strip(string.punctuation) for w in word_lst]
w.append(word_lst)
for word in word_lst:
word_map[word]+=[line_no]
line_no+=1
print(word_map)
index_lst = sorted(list(word_map.items()))
print(index_lst)
for word, line_set in index_lst:
line_lst = sorted(list(line_set))
line_str = str( line_lst[0] )
for line_no in line_lst[1:]:
line_str += ", {}".format( line_no )
print("{:14s}:".format(word), line_str )
input_file_open.close()
build_word_index()
示例输入:
one two
three three four
one two four
four three
示例输出:
file name: defaultdict(<class 'list'>, {'one': [0, 2], 'three': [1, 1, 3], 'two': [0, 2], 'four': [1, 2, 3]})
[('four', [1, 2, 3]), ('one', [0, 2]), ('three', [1, 1, 3]), ('two', [0, 2])]
four : 1, 2, 3
one : 0, 2
three : 1, 1, 3
two : 0, 2
答案 2 :(得分:0)
已经回答了,但这是我的看法。没有尝试过代码,但我认为它应该可行。
from collections import defaultdict
import re
def build_word_index(filename):
word_index = defaultdict(list)
with open(filename,'rb') as word_file:
for i, line in enumerate(word_file):
line = line.strip().lower()
for word in line.split():
word_index[word].append(i)
for word in sorted(word_index):
print word + ': ' + ', '.join(map(str,word_index[word]))
return dict(word_index)