我一直在研究一个程序,该程序查找仅在文本中出现一次的单词。但是,当程序找到一个单词时,我希望它为该单词提供一些上下文。
这是我的代码。
from collections import Counter
from string import punctuation
text = str("bible.txt")
with open(text) as f:
word_counts = Counter(word.strip(punctuation) for line in f for word in
line.split())
unique = [word.lower() for word, count in word_counts.items() if count == 1]
with open(text, 'r') as myfile:
wordlist = myfile.read().lower()
print(unique)
print(len(unique), " unique words found.")
for word in unique:
first = 1
second = 1
index = wordlist.index(word)
if wordlist[index - first:index] is not int():
first += 1
if wordlist[index:index + second] is not ".":
second += 1
print(" ")
first_part = wordlist[index - first:index]
second_part = wordlist[index:index + second]
print(word)
print("%s %s" % ("".join(first_part), "".join(second_part)))
this是输入文本。
理想情况下,它会显示
sojournings
1 Jacob lived in the land of his father's sojournings, in the land of
Canaan.
generations
2 These are the generations of Jacob.
基本上我希望它显示单词所在的句子,开头是诗句编号。我知道我会对索引做些什么,但是老实说我不知道该怎么做。
任何帮助将不胜感激。
谢谢, 本
答案 0 :(得分:1)
我将检索所选单词的第一个字母的索引(在整个字符串中,对于圣经来说,这将是长;'),然后找到第一个“”。在那封信之前。我还会找到“下一个”“。”,但是可能强制使用最小长度以确保小句中的上下文。这给了您包括/打印/显示的范围。
def stringer():
mystring = """ the quick brown fox. Which jumped over the lazy dog and died a horrible death. ad ipsum valorem"""
word_posn = mystring.find("lazy")
start_posn = mystring[:word_posn].rfind(".") + 1
end_posn = mystring[word_posn:].find(".")+word_posn +1
return '"' + mystring[start_posn:end_posn].strip() + '"'
此代码的编码速度非常快,因此为出现的错误表示歉意。
答案 1 :(得分:1)
我将把完整的代码留在这里给以后遇到的任何人。
from collections import Counter
from string import punctuation
import time
path = input("Path to file: ")
with open(path) as f:
word_counts = Counter(word.strip(punctuation) for line in f for word in line.split())
wordlist = open(path).read().replace('\n', '')
unique = [word for word, count in word_counts.items() if count == 1]
print(unique)
print(len(unique), " unique words found.")
for word in unique:
print(" ")
word_posn = wordlist.find(word)
start_posn = wordlist[:word_posn].rfind("." or "," or "!" or "?")) + 1
end_posn = wordlist[word_posn:].find("." or "," or "!" or "?")) + word_posn + 1
print(word)
print(wordlist[start_posn:end_posn])
也要向@lb_so大喊帮助!