我们如何在Python中打印文本文件中出现单词的行号?

时间:2015-11-16 17:38:24

标签: python python-3.x

我需要这个来打印文本文件中的相应行号。

def index (filename, lst):
    infile = open('raven.txt', 'r')
    lines =  infile.readlines()
    words = []
    dic = {}

    for line in lines:
        line_words = line.split(' ')
        words.append(line_words)
    for i in range(len(words)):
        for j in range(len(words[i])):
            if words[i][j] in lst:

                dic[words[i][j]] = i

    return dic

结果:

In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])

Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}

(上面的单词出现在几行中,但它只打印一行,有些则不打印任何内容 此外,它不计算文本文件中的空行。所以8实际上应该是9,因为它有一个空行,它没有计算。)

请告诉我如何解决这个问题。

4 个答案:

答案 0 :(得分:2)

def index (filename, lst):

    infile = open('raven.txt', 'r')
    lines =  infile.readlines()
    words = []
    dic = {}

    for line in lines:
        line_words = line.split(' ')
        words.append(line_words)
    for i in range(len(words)):
        for j in range(len(words[i])):
            if words[i][j] in lst:
                if words[i][j] not in dic.keys():
                    dic[words[i][j]] = set()
                dic[words[i][j]].add(i + 1) #range starts from 0
    return dic

如果单词在同一行中多次出现,则使用集合而不是列表非常有用。

答案 1 :(得分:1)

使用defaultdict为每行创建一个亚麻的列表:

from collections import defaultdict
def index(filename, lst):
    with open(filename, 'r') as infile:
        lines = [line.split() for line in infile]
    word2linenumbers = defaultdict(list)

    for linenumber, line in enumerate(lines, 1):
        for word in line:
            if word in lst:
                word2linenumbers[word].append(linenumber)
    return word2linenumbers

答案 2 :(得分:1)

您还可以使用dict.setdefault为每个单词开始新列表,或者如果已找到该单词,则附加到现有列表:

def index(filename, lst):
    # For larger lists, checking membership will be asymptotically faster using a set.
    lst = set(lst) 
    dic = {}

    with open(filename, 'r') as fobj:
        for lineno, line in enumerate(fobj, 1):
            words = line.split()
            for word in words:
                if word in lst:
                    dic.setdefault(word, []).append(lineno)

    return dic

答案 3 :(得分:0)

你可以解决两个主要问题:

1。)多个索引:您需要启动/分配列表作为dict值而不是单个int。否则,每次使用该单词找到新行时,每个单词都会重新分配一个新索引。

2。)空行应该被读作一行,所以我认为它只是一个索引问题。您的第一行索引为0,因为范围中的第一个数字从0开始。

您可以按照以下方式简化程序:

def index (filename, lst):
    wordinds = {key:[] for key in lst} #initiates an empty list for each word
    with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
    #the with statement is useful. trust.
        for linenum,line in enumerate(infile):
            for word in line.rstrip().split(): #strip new line and split into words
                if word in wordinds:
                    wordinds[word].append(linenum)

    return {x for x in wordinds.iteritems() if x[1]} #filters empty lists

这简化了嵌套到每个枚举的for循环的所有内容。如果您希望第一行为1而第二行为2,则必须将wordinds[word].append(linenum)更改为....append(linenum + 1)

编辑:有人在另一个答案中提出了一个好处,让enumerate(infile,1)在索引1处开始枚举。这样更清洁。