我已经创建了一个可以正确识别重复单词的程序,但我这样做的方式是不允许我识别副本来自的行。我确实创建了一个行列表(linelist),然后从这些行中获取所有单词并将它们放入自己的列表中。我一直在寻找一种方法来显示副本来自哪条线。
可以在下面找到通过程序运行的文本,然后是程序本身。忽略每个引用后的空白行,因为它不会出现在输入文本文件中。另外,作为参考," XXX" mark是我希望行号出现的地方。
他将使自己的自由自由得到保障,
必须防止他的敌人受到压迫;
如果他违反这项义务,他
他建立了一个可以达到自己的先例。- Thomas Paine
import math
file = open(str(input("Enter file name: ")), "r")
linelist = []
file_cont = file.readlines()
for lines in file_cont:
linelist.append(lines)
wordlist = []
# function that splits file into lines, then into words
def split_words(string):
lines = string
for line in lines:
for word in line.split():
yield word
# loop to add each word from prior function into a single list
for word in split_words(file_cont):
wordlist.append(word)
# variables declared
x = 0
y = 1
z = len(wordlist)
# loop that prints the first and following word next to each other
while z > x:
#print(wordlist[x], wordlist[y])
if wordlist[x] == wordlist[y]:
print("Found word: ",'"',wordlist[x],'"'," on line {}.".format(XXX), sep="")
x += 1
y += 1
if y == z:
break
非常感谢任何帮助。谢谢!
答案 0 :(得分:0)
我建议创建一个字典,其中键是单词'索引和值是当前行索引。
您可以从linelist生成它。
答案 1 :(得分:0)
枚举时这很简单:
with open('data.txt') as data:
lines = [i.split() for i in data]
for i, j in enumerate(lines):
if any(j[h] == j[h + 1] for h, k in enumerate(j[:-1])):
print i + 1 # add one because counting starts 0
答案 2 :(得分:0)
不要在一长串单词中查找重复项,而是将其保留在嵌套list
中。
# why import math?
with open(input("Enter file name: "), "r") as f: # input() already returns a str
linelist = [line.split() for line in f.readlines()] # don't need to duplicate this with file_cont
for l in range(len(linelist)-1): # -1 to avoid index out of range
for w in range(len(linelist[l])-1): # -1 to avoid index out of range
if linelist[l][w] == linelist[l][w+1]:
print("Found word: ",'"',linelist[l][w],'"'," on line {}.".format(l+1), sep="")
if linelist[l][-1] == linelist[l+1][0]: # check repetition between lines
print("Found word: ",'"',linelist[l][-1],'"'," on line {}.".format(l+2), sep="")
for w in range(len(linelist[-1])-1): # check last line
if linelist[-1][w] == linelist[-1][w+1]:
print("Found word: ",'"',linelist[-1][w],'"'," on line {}.".format(len(linelist)), sep="")
文件(添加额外guard
以显示仅检查连续重复):
He that would make his own liberty liberty secure,
must guard even his enemy from guard oppression;
for for if he violates this duty, he
he establishes a precedent that will reach to himself.
-- Thomas Paine
结果:
Found word: "liberty" on line 1.
Found word: "for" on line 3.
Found word: "he" on line 4.