打印已识别重复字的行号

时间:2015-04-06 20:34:53

标签: python

我已经创建了一个可以正确识别重复单词的程序,但我这样做的方式是不允许我识别副本来自的行。我确实创建了一个行列表(linelist),然后从这些行中获取所有单词并将它们放入自己的列表中。我一直在寻找一种方法来显示副本来自哪条线。

可以在下面找到通过程序运行的文本,然后是程序本身。忽略每个引用后的空白行,因为它不会出现在输入文本文件中。另外,作为参考," XXX" mark是我希望行号出现的地方。

  

他将使自己的自由自由得到保障,

     

必须防止他的敌人受到压迫;

     

如果他违反这项义务,他

     他建立了一个可以达到自己的先例。

     

- Thomas Paine

import math
file = open(str(input("Enter file name: ")), "r")

linelist = []

file_cont = file.readlines()
for lines in file_cont:
    linelist.append(lines)

wordlist = []
# function that splits file into lines, then into words

def split_words(string):
    lines = string
    for line in lines:
        for word in line.split():
            yield word

# loop to add each word from prior function into a single list

for word in split_words(file_cont):
    wordlist.append(word)

# variables declared
x = 0
y = 1
z = len(wordlist)

# loop that prints the first and following word next to each other
while z > x:
    #print(wordlist[x], wordlist[y])

    if wordlist[x] == wordlist[y]:
        print("Found word: ",'"',wordlist[x],'"'," on line {}.".format(XXX), sep="")

    x += 1
    y += 1

    if y == z:
        break

非常感谢任何帮助。谢谢!

3 个答案:

答案 0 :(得分:0)

我建议创建一个字典,其中键是单词'索引和值是当前行索引。

您可以从linelist生成它。

答案 1 :(得分:0)

枚举时这很简单:

with open('data.txt') as data:
    lines = [i.split() for i in data]

for i, j in enumerate(lines):
    if any(j[h] == j[h + 1] for h, k in enumerate(j[:-1])):
        print i + 1 # add one because counting starts 0

答案 2 :(得分:0)

不要在一长串单词中查找重复项,而是将其保留在嵌套list中。

# why import math?

with open(input("Enter file name: "), "r") as f: # input() already returns a str
    linelist = [line.split() for line in f.readlines()] # don't need to duplicate this with file_cont

for l in range(len(linelist)-1): # -1 to avoid index out of range
    for w in range(len(linelist[l])-1): # -1 to avoid index out of range
        if linelist[l][w] == linelist[l][w+1]:
            print("Found word: ",'"',linelist[l][w],'"'," on line {}.".format(l+1), sep="")

    if linelist[l][-1] == linelist[l+1][0]: # check repetition between lines
        print("Found word: ",'"',linelist[l][-1],'"'," on line {}.".format(l+2), sep="")

for w in range(len(linelist[-1])-1): # check last line
    if linelist[-1][w] == linelist[-1][w+1]:
            print("Found word: ",'"',linelist[-1][w],'"'," on line {}.".format(len(linelist)), sep="")

文件(添加额外guard以显示仅检查连续重复):

He that would make his own liberty liberty secure, 
must guard even his enemy from guard oppression;
for for if he violates this duty, he
he establishes a precedent that will reach to himself.
-- Thomas Paine

结果:

Found word: "liberty" on line 1.
Found word: "for" on line 3.
Found word: "he" on line 4.