问题:
我想通过在python中创建一个数据结构来执行索引,该数据结构将存储来自给定文本文件的所有单词,并且还将存储其行号(这些单词出现的所有行)以及单词的位置(列#)在该特定行中。
到目前为止,我可以通过在列表中附加所有行号来将单词存储在字典中,但我无法将其位置存储在该特定行中。
我需要这种数据结构来更快地搜索文本文件。
这是我的代码到目前为止:
from collections import defaultdict
thetextfile = open('file.txt','r')
thetextfile = thetextfile.read()
file_s = thetextfile.split("\n")
wordlist = defaultdict(list)
lineNumber = 0
for (i,line) in enumerate(file_s):
lineNumber = i
for word in line.split(" "):
wordlist[word].append(lineNumber)
print(wordlist)
答案 0 :(得分:0)
以下是一些代码,用于存储文本文档中单词的行号和列:
from collections import defaultdict, namedtuple
# build a named tuple for the word locations
Location = namedtuple('Location', 'line col')
# dict keyd by word in document
word_locations = defaultdict(list)
# go through each line in the document
for line_num, line in enumerate(open('my_words.txt', 'r').readlines()):
column = -1
prev_col = 0
# process the line, one word at a time
while True:
if prev_col < column:
word = line[prev_col:column]
word_locations[word].append(Location(line_num, prev_col))
prev_col = column+1
# find the next space
column = line.find(' ', prev_col)
# check for more spaces on the line
if column == -1:
# there are no more spaces on the line, store the last word
word = line[prev_col:column]
word_locations[word].append(Location(line_num, prev_col))
# go onto the next line
break
print(word_locations)