Question

我正在尝试在Python中打印文件中出现的单词和行号。目前我得到第二个单词的正确数字，但我查找的第一个单词不会打印正确的行号。我必须遍历infile，使用字典存储行号，删除新行号，删除任何标点符号＆amp;拉数字时跳过空白行。我需要添加一个实际上是列表的值，这样如果单词包含在多行中，我可以将行号添加到列表中。

调整后的代码：

def index(f,wordf):

    infile = open(filename, 'r')
    dct = {}
    count = 0
    for line in infile:
        count += 1
        newLine = line.replace('\n', ' ')
        if newLine == ' ':
            continue
        for word in wordf:
            if word in split_line:
                if word in dct:
                    dct[word] += 1
                else:
                    dct[word] = 1

    for word in word_list:
        print('{:12} {},'.format(word,dct[word]))
    infile.close()

当前输出：

>>> index('leaves.txt',['cedars','countenance'])
pines        [9469, 9835, 10848, 10883],
counter      [792, 2092, 2374],

期望的输出：

>>> index2('f.txt',['pines','counter','venison'])
pines       [530, 9469, 9835, 10848, 10883]
counter     [792, 2092, 2374]

Answer 1

您的文件设置方式存在一些歧义，但我认为理解。试试这个：

import numpy as np # add this import
...

    for word in word_f:
        if word in split_line:

            np_array = np.array(split_line)
            item_index_list = np.where(np_array == word)

            dct[word] = item_index_list # note, you might want the 'index + 1' instead of the 'index'

for word in word_f:
    print('{:12} {},'.format(word,dct[word]))
...

是的，据我所知，你没有使用你的增量＆＃39;变量

我认为我会工作，如果它没有，请告诉我，我会修复它

Answer 2

根据请求，我做了一个额外的答案（我觉得有效）而没有导入另一个库

def index2(f,word_f):

    infile = open(f, 'r')
    dct = {}
    # deleted line
    for line in infile:
        newLine = line.replace('\n', ' ')
        if newLine == ' ':
            continue
        # deleted line
        newLine2 = removePunctuation(newLine)
        split_line = newLine2.split()
        for word in word_f:
            count = 0 # you might want to start at 1 instead, if you're going for 'word number'
            # important note: you need to have 'word2', not 'word' here, and on the next line
            for word2 in split_line: # changed to looping through data
                if word2 == word:                    
                    if word2 in dct:
                        temp = dct[word]
                        temp.append(count)
                        dct[word] = temp
                    else:
                        temp = []
                        temp.append(count)
                        dct[word] = temp
                count += 1
    for word in word_f:
        print('{:12} {},'.format(word,dct[word]))
    infile.close()

请注意，如果传入的单词不在文件中，我认为此代码不会处理。我对你正在抓取的文件不是肯定的，所以我不能确定，但我认为如果你传入一个文件中不存在的单词就会出错。

Answer 3

注意：我从我的其他帖子中获取此代码以查看它是否有效，而且似乎确实

<cfset x = 10090000000557765/>
<cfset y = 10090000000557763/>
<cfset isZero = PrecisionEvaluate( x-y )/>
<cfif isZero EQ 0>
   x and y are equal
<cfelse>
   x and y are not equal
</cfif>

和输出：

def index2():

    word_list = ["work", "many", "lots", "words"]
    infile = ["lots of words","many many work words","how come this picture lots work","poem poem more words that rhyme"]
    dct = {}
    # deleted line
    for line in infile:
        newLine = line.replace('\n', ' ') # shouldn't do anything, because I have no newlines
        if newLine == ' ':
            continue
        # deleted line
        newLine2 = newLine # ignoring punctuation
        split_line = newLine2.split()
        for word in word_list:
            count = 0 # you might want to start at 1 instead, if you're going for 'word number'
            # important note: you need to have 'word2', not 'word' here, and on the next line
            for word2 in split_line: # changed to looping through data
                if word2 == word:
                    if word2 in dct:
                        temp = dct[word]
                        temp.append(count)
                        dct[word] = temp
                    else:
                        temp = []
                        temp.append(count)
                        dct[word] = temp
                count += 1
    for word in word_list:
        print('{:12} {}'.format(word, ", ".join(map(str, dct[word])))) # edited output so it's comma separated list without a trailing comma


def main():
    index2()


if __name__ == "__main__":main()

和解释：

work         2, 5
many         0, 1
lots         0, 4
words        2, 3, 3

当他们按照该顺序附加时，他们会获得正确的单词放置位置

Answer 4

我最大的错误是我没有正确地将行号添加到柜台。我完全使用了错误的调用，并且没有做任何事情来增加行号，因为在文件中找到了单词。正确的格式是dct [word] + = [count]而不是dct [word] + = 1

def index(filename,word_list):

    infile = open(filename, 'r')
    dct = {}
    count = 0
    for line in infile:
        count += 1
        newLine = line.replace('\n', ' ')
        if newLine == ' ':
            continue
        newLine2 = removePunctuation(newLine)
        split_line = newLine2.split()
        for word in word_list:
            if word in split_line:
                if word in dct:
                    dct[word] += [count]
                else:
                    dct[word] = [count]
    for word in word_list:
        print('{:12} {}'.format(word,dct[word]))
    infile.close()

打印Word＆amp;文件Python中Word出现的行号

4 个答案: