我正在尝试在Python中打印文件中出现的单词和行号。目前我得到第二个单词的正确数字,但我查找的第一个单词不会打印正确的行号。 我 必须遍历infile,使用字典存储行号,删除新行号,删除任何标点符号&拉数字时跳过空白行。我需要添加一个实际上是列表的值,这样如果单词包含在多行中,我可以将行号添加到列表中。
调整后的代码:
def index(f,wordf):
infile = open(filename, 'r')
dct = {}
count = 0
for line in infile:
count += 1
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
for word in wordf:
if word in split_line:
if word in dct:
dct[word] += 1
else:
dct[word] = 1
for word in word_list:
print('{:12} {},'.format(word,dct[word]))
infile.close()
当前输出:
>>> index('leaves.txt',['cedars','countenance'])
pines [9469, 9835, 10848, 10883],
counter [792, 2092, 2374],
期望的输出:
>>> index2('f.txt',['pines','counter','venison'])
pines [530, 9469, 9835, 10848, 10883]
counter [792, 2092, 2374]
答案 0 :(得分:0)
您的文件设置方式存在一些歧义,但我认为理解。 试试这个:
import numpy as np # add this import
...
for word in word_f:
if word in split_line:
np_array = np.array(split_line)
item_index_list = np.where(np_array == word)
dct[word] = item_index_list # note, you might want the 'index + 1' instead of the 'index'
for word in word_f:
print('{:12} {},'.format(word,dct[word]))
...
是的,据我所知,你没有使用你的增量'变量
我认为我会工作,如果它没有,请告诉我,我会修复它
答案 1 :(得分:0)
根据请求,我做了一个额外的答案(我觉得有效)而没有导入另一个库
def index2(f,word_f):
infile = open(f, 'r')
dct = {}
# deleted line
for line in infile:
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
# deleted line
newLine2 = removePunctuation(newLine)
split_line = newLine2.split()
for word in word_f:
count = 0 # you might want to start at 1 instead, if you're going for 'word number'
# important note: you need to have 'word2', not 'word' here, and on the next line
for word2 in split_line: # changed to looping through data
if word2 == word:
if word2 in dct:
temp = dct[word]
temp.append(count)
dct[word] = temp
else:
temp = []
temp.append(count)
dct[word] = temp
count += 1
for word in word_f:
print('{:12} {},'.format(word,dct[word]))
infile.close()
请注意,如果传入的单词不在文件中,我认为此代码不会处理。我对你正在抓取的文件不是肯定的,所以我不能确定,但我认为如果你传入一个文件中不存在的单词就会出错。
答案 2 :(得分:0)
注意:我从我的其他帖子中获取此代码以查看它是否有效,而且似乎确实
<cfset x = 10090000000557765/>
<cfset y = 10090000000557763/>
<cfset isZero = PrecisionEvaluate( x-y )/>
<cfif isZero EQ 0>
x and y are equal
<cfelse>
x and y are not equal
</cfif>
和输出:
def index2():
word_list = ["work", "many", "lots", "words"]
infile = ["lots of words","many many work words","how come this picture lots work","poem poem more words that rhyme"]
dct = {}
# deleted line
for line in infile:
newLine = line.replace('\n', ' ') # shouldn't do anything, because I have no newlines
if newLine == ' ':
continue
# deleted line
newLine2 = newLine # ignoring punctuation
split_line = newLine2.split()
for word in word_list:
count = 0 # you might want to start at 1 instead, if you're going for 'word number'
# important note: you need to have 'word2', not 'word' here, and on the next line
for word2 in split_line: # changed to looping through data
if word2 == word:
if word2 in dct:
temp = dct[word]
temp.append(count)
dct[word] = temp
else:
temp = []
temp.append(count)
dct[word] = temp
count += 1
for word in word_list:
print('{:12} {}'.format(word, ", ".join(map(str, dct[word])))) # edited output so it's comma separated list without a trailing comma
def main():
index2()
if __name__ == "__main__":main()
和解释:
work 2, 5
many 0, 1
lots 0, 4
words 2, 3, 3
当他们按照该顺序附加时,他们会获得正确的单词放置位置
答案 3 :(得分:0)
我最大的错误是我没有正确地将行号添加到柜台。我完全使用了错误的调用,并且没有做任何事情来增加行号,因为在文件中找到了单词。正确的格式是dct [word] + = [count]而不是dct [word] + = 1
def index(filename,word_list):
infile = open(filename, 'r')
dct = {}
count = 0
for line in infile:
count += 1
newLine = line.replace('\n', ' ')
if newLine == ' ':
continue
newLine2 = removePunctuation(newLine)
split_line = newLine2.split()
for word in word_list:
if word in split_line:
if word in dct:
dct[word] += [count]
else:
dct[word] = [count]
for word in word_list:
print('{:12} {}'.format(word,dct[word]))
infile.close()