Question

我正在寻找一个更好的解决此问题的方法：

我想做的是自动连接由换行符分隔的书中的单词。我尝试过的代码是：

import nltk
from nltk.tokenize import word_tokenize
import re

with open ('Fr-dictionary.txt') as fr:  #opens the dictionary
dic = word_tokenize(fr.read().lower()) #stores the first dictionary

pat=re.compile(r'[.?\-",:;.?!»’()quls\d]+|\w+(?:-\w+)+') #pattern for 
punctuation, digits and words separated by hyphens (-)
reg= list(filter(pat.match, text))


with open ('fr-text.txt') as tx2:  #opening text containing the 
separated words
    text_input = word_tokenize(tx2.read().lower()) #stores the input 
text

words_it = iter(text_input) 

out_file1=open("finaltext.txt","w") #defining name of output file
valid_words1=[ ] #empty list to append the existing words 
invalid_words1=[ ] #empty list to append the invalid(non-existing)words 


for w in words_it: #looping through the tokenized text
    if w in dic:
        valid_words1.append(w)
    elif w in reg:
        valid_words1.append(w)#appending the valid items 
    else:
        try:
            concatenated = w + next(words_it) #concatenating strings
            if concatenated in dic:
                valid_words1.append(concatenated)#append if valid
        except StopIteration:
                   pass
        else:
           invalid_words1.append(w) #appending the invalid_words

a1=' '.join(valid_words1) #converting list into a string

out_file1.write(a1) #writing the output to a file
out_file1.close()



print(a1) #print list converted into text

print(invalid_words1)
print(len(invalid_words)

使用此代码，我已经：

a）将文本标记化（放入列表中）并遍历整个列表，检查每一项是否存在于字典中（包括标点符号） b）如果没有，我尝试将单词的两个部分连接起来， c）检查串联输出是否存在于字典中，并且 d）如果是，则将其附加到有效单词的相同列表中，但 e）如果没有添加到另一个带有无效词的列表中。

问题：问题在于，有时要连接的单词的第一部分是现有的/有效的单词（存在于字典中），然后程序将其忽略，并且不与其第二部分连接，从而导致文本这些错误。解决这个问题的任何想法？我认为解决方案可能是：循环并附加所有存在的单词，当出现不存在的单词时，程序可以返回到前一个单词，连接起来，检入dic，然后继续...该怎么做？

Answer 1

不确定我是否遇到您的问题，但是解决python循环中退一步问题的一种方法就是保存循环的最后状态，即：

last = None
for i in list_:
    #do stuff
    last = i

或者您可以使用枚举功能

for index, i in enumerate(list_):
    #do stuff
    previous = list_[index-1]

是否可以在python的循环中退一步？

1 个答案: