我需要这个来打印文本文件中的相应行号。
def index (filename, lst):
infile = open('raven.txt', 'r')
lines = infile.readlines()
words = []
dic = {}
for line in lines:
line_words = line.split(' ')
words.append(line_words)
for i in range(len(words)):
for j in range(len(words[i])):
if words[i][j] in lst:
dic[words[i][j]] = i
return dic
结果:
In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])
Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}
(上面的单词出现在几行中,但它只打印一行,有些则不打印任何内容 此外,它不计算文本文件中的空行。所以8实际上应该是9,因为它有一个空行,它没有计算。)
请告诉我如何解决这个问题。
答案 0 :(得分:2)
def index (filename, lst):
infile = open('raven.txt', 'r')
lines = infile.readlines()
words = []
dic = {}
for line in lines:
line_words = line.split(' ')
words.append(line_words)
for i in range(len(words)):
for j in range(len(words[i])):
if words[i][j] in lst:
if words[i][j] not in dic.keys():
dic[words[i][j]] = set()
dic[words[i][j]].add(i + 1) #range starts from 0
return dic
如果单词在同一行中多次出现,则使用集合而不是列表非常有用。
答案 1 :(得分:1)
使用defaultdict为每行创建一个亚麻的列表:
from collections import defaultdict
def index(filename, lst):
with open(filename, 'r') as infile:
lines = [line.split() for line in infile]
word2linenumbers = defaultdict(list)
for linenumber, line in enumerate(lines, 1):
for word in line:
if word in lst:
word2linenumbers[word].append(linenumber)
return word2linenumbers
答案 2 :(得分:1)
您还可以使用dict.setdefault
为每个单词开始新列表,或者如果已找到该单词,则附加到现有列表:
def index(filename, lst):
# For larger lists, checking membership will be asymptotically faster using a set.
lst = set(lst)
dic = {}
with open(filename, 'r') as fobj:
for lineno, line in enumerate(fobj, 1):
words = line.split()
for word in words:
if word in lst:
dic.setdefault(word, []).append(lineno)
return dic
答案 3 :(得分:0)
你可以解决两个主要问题:
1。)多个索引:您需要启动/分配列表作为dict值而不是单个int。否则,每次使用该单词找到新行时,每个单词都会重新分配一个新索引。
2。)空行应该被读作一行,所以我认为它只是一个索引问题。您的第一行索引为0
,因为范围中的第一个数字从0开始。
您可以按照以下方式简化程序:
def index (filename, lst):
wordinds = {key:[] for key in lst} #initiates an empty list for each word
with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
#the with statement is useful. trust.
for linenum,line in enumerate(infile):
for word in line.rstrip().split(): #strip new line and split into words
if word in wordinds:
wordinds[word].append(linenum)
return {x for x in wordinds.iteritems() if x[1]} #filters empty lists
这简化了嵌套到每个枚举的for
循环的所有内容。如果您希望第一行为1
而第二行为2
,则必须将wordinds[word].append(linenum)
更改为....append(linenum + 1)
编辑:有人在另一个答案中提出了一个好处,让enumerate(infile,1)
在索引1处开始枚举。这样更清洁。