对于某个单词的每次出现,我需要通过显示单词出现之前和之后的大约5个单词来显示上下文。
“陌生人”这个词的输出示例'在您输入occurs('stranger', 'movie.txt')
时的内容文本文件中:
到目前为止我的代码:
def occurs(word, filename):
infile = open(filename,'r')
lines = infile.read().splitlines()
infile.close()
wordsString = ''.join(lines)
words = wordsString.split()
print(words)
for i in range(len(words)):
if words[i].find(word):
#stuck here
答案 0 :(得分:4)
我建议根据words
切片i
:
print(words[i-5:i+6])
(这将是你评论的地方)
或者,如您的示例所示进行打印:
print("...", " ".join(words[i-5:i+6]), "...")
考虑前5中的单词:
if i > 5:
print("...", " ".join(words[i-5:i+6]), "...")
else:
print("...", " ".join(words[0:i+6]), "...")
此外,find
没有按照您的想法行事。如果find()
找不到该字符串,则返回-1
,当在if语句中使用时,它将评估为True
。尝试:
if word in words[i].lower():
答案 1 :(得分:0)
这将检索words
中每个单词出现的索引,该单词是文件中所有单词的列表。然后使用切片来获得匹配的单词列表以及前后的5个单词。
def occurs(word, filename):
infile = open(filename,'r')
lines = infile.read().splitlines()
infile.close()
wordsString = ''.join(lines)
words = wordsString.split()
matches = [i for i, w in enumerate(words) if w.lower().find(word) != -1]
for m in matches:
l = " ".join(words[m-5:m+6])
print(f"... {l} ...")
答案 2 :(得分:0)
考虑使用more_itertools.adajacent
工具。
<强>鉴于强>
import more_itertools as mit
s = """\
But we did not answer him, for he was a stranger and we were not used to, strangers and were shy of them.
We were simple folk, in our village, and when a stranger was a pleasant person we were soon friends.
"""
word, distance = "stranger", 5
words = s.splitlines()[0].split()
<强>演示强>
neighbors = list(mit.adjacent(lambda x: x == word, words, distance))
" ".join(word for bool_, word in neighbors if bool_)
# 'him, for he was a stranger and we were not used'
<强>详情
more_itertools.adjacent
返回一个可迭代的元组,例如(bool
,item)对。对于满足谓词的字符串中的单词,将返回True
布尔值。例如:
>>> neighbors
[(False, 'But'),
...
(True, 'a'),
(True, 'stranger'),
(True, 'and'),
...
(False, 'to,')]
从目标词distance
给出2017-10-04T03:24:46.957Z 925a40ba-a8b3-11e7-be24-8d954fcaf057
SyntaxError: Unexpected end of JSON input
at Object.parse (native)
at IncomingMessage.<anonymous> (/var/task/index.js:67:37)
at emitNone (events.js:91:20)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickDomainCallback (internal/process/next_tick.js:128:9)
的结果中过滤相邻词。
注意:more_itertools
是第三方库。按pip install more_itertools
安装。
答案 3 :(得分:0)
每当我看到文件的滚动视图时,我认为collections.deque
import collections
def occurs(needle, fname):
with open(fname) as f:
lines = f.readlines()
words = iter(''.join(lines).split())
view = collections.deque(maxlen=11)
# prime the deque
for _ in range(10): # leaves an 11-length deque with 10 elements
view.append(next(words, ""))
for w in words:
view.append(w)
if view[5] == needle:
yield list(view.copy())
请注意,此方法故意不处理文件前5个字或最后5个字中needle
个名称的任何边缘情况。关于第三个词的匹配是否应该给出第一个到第九个词,或者是不同的东西,这个问题是模棱两可的。