Question

对于某个单词的每次出现，我需要通过显示单词出现之前和之后的大约5个单词来显示上下文。

“陌生人”这个词的输出示例＆＃39;在您输入occurs('stranger', 'movie.txt')时的内容文本文件中：

到目前为止我的代码：

def occurs(word, filename):

    infile = open(filename,'r')
    lines = infile.read().splitlines()
    infile.close()

    wordsString = ''.join(lines)
    words = wordsString.split()
    print(words)

    for i in range(len(words)):
        if words[i].find(word):
            #stuck here

Answer 1

我建议根据words切片i：

print(words[i-5:i+6])

（这将是你评论的地方）

或者，如您的示例所示进行打印：

print("...", " ".join(words[i-5:i+6]), "...")

考虑前5中的单词：

if i > 5:
    print("...", " ".join(words[i-5:i+6]), "...")
else:
    print("...", " ".join(words[0:i+6]), "...")

此外，find没有按照您的想法行事。如果find()找不到该字符串，则返回-1，当在if语句中使用时，它将评估为True。尝试：

if word in words[i].lower():

Answer 2

这将检索words中每个单词出现的索引，该单词是文件中所有单词的列表。然后使用切片来获得匹配的单词列表以及前后的5个单词。

def occurs(word, filename):
    infile = open(filename,'r')
    lines = infile.read().splitlines()
    infile.close()

    wordsString = ''.join(lines)
    words = wordsString.split()

    matches = [i for i, w in enumerate(words) if w.lower().find(word) != -1]

    for m in matches:
        l = " ".join(words[m-5:m+6])
        print(f"... {l} ...")

Answer 3

考虑使用more_itertools.adajacent工具。

<强>鉴于

import more_itertools as mit


s = """\
But we did not answer him, for he was a stranger and we were not used to, strangers and were shy of them.
We were simple folk, in our village, and when a stranger was a pleasant person we were soon friends.
"""

word, distance = "stranger", 5
words = s.splitlines()[0].split()

<强>演示

neighbors = list(mit.adjacent(lambda x: x == word, words, distance))

" ".join(word for bool_, word in neighbors if bool_)
# 'him, for he was a stranger and we were not used'

<强>详情

more_itertools.adjacent返回一个可迭代的元组，例如（bool，item）对。对于满足谓词的字符串中的单词，将返回True布尔值。例如：

>>> neighbors
[(False, 'But'),
 ...
 (True, 'a'),
 (True, 'stranger'),
 (True, 'and'),
 ...
 (False, 'to,')]

从目标词distance给出2017-10-04T03:24:46.957Z 925a40ba-a8b3-11e7-be24-8d954fcaf057 SyntaxError: Unexpected end of JSON input at Object.parse (native) at IncomingMessage.<anonymous> (/var/task/index.js:67:37) at emitNone (events.js:91:20) at IncomingMessage.emit (events.js:185:7) at endReadableNT (_stream_readable.js:974:12) at _combinedTickCallback (internal/process/next_tick.js:80:11) at process._tickDomainCallback (internal/process/next_tick.js:128:9)的结果中过滤相邻词。

注意：more_itertools是第三方库。按pip install more_itertools安装。

Answer 4

每当我看到文件的滚动视图时，我认为collections.deque

import collections

def occurs(needle, fname):
    with open(fname) as f:
        lines = f.readlines()

    words = iter(''.join(lines).split())

    view = collections.deque(maxlen=11)
    # prime the deque
    for _ in range(10):  # leaves an 11-length deque with 10 elements
        view.append(next(words, ""))
    for w in words:
        view.append(w)
        if view[5] == needle:
            yield list(view.copy())

请注意，此方法故意不处理文件前5个字或最后5个字中needle个名称的任何边缘情况。关于第三个词的匹配是否应该给出第一个到第九个词，或者是不同的东西，这个问题是模棱两可的。

对于文本文件中的每个单词，提取周围的5个单词

4 个答案: