特定字符串是否与文本文件中的字符串匹

时间:2016-09-03 18:15:22

标签: python string python-2.7

我有一个包含许多单词的文本文件(每行单个单词)。我必须阅读每个单词,修改单词,然后检查修改后的单词是否与文件中的任何单词匹配。我在最后一部分遇到问题(这是我代码中的hasMatch方法)。这听起来很简单,我知道我应该做什么,但无论我尝试什么都行不通。

#read in textfile 
myFile = open('good_words.txt')


#function to remove first and last character in string, and reverse string
def modifyString(str):
    rmFirstLast = str[1:len(str)-2] #slicing first and last char
    reverseStr = rmFirstLast[::-1] #reverse string 
    return reverseStr

#go through list of words to determine if any string match modified string
def hasMatch(modifiedStr):
    for line in myFile:
        if line == modifiedStr:
            print(modifiedStr + " found")
        else:
            print(modifiedStr + "not found")

for line in myFile:
    word = str(line) #save string in line to a variable

    #only modify strings that are greater than length 3
    if len(word) >= 4:
        #global modifiedStr #make variable global
        modifiedStr = modifyString(word) #do string modification
        hasMatch(modifiedStr)

myFile.close()

2 个答案:

答案 0 :(得分:2)

这里有几个问题

  1. 你必须剥离线条,否则你会得到未通过比赛的换行/ CR字符
  2. 你必须一劳永逸地读取文件,否则文件迭代器会在第一次用完之后用完
  3. 速度不好:使用set代替list
  4. 加快搜索速度
  5. 切片过于复杂和错误:str[1:-1]做到了(感谢那些评论我回答的人)
  6. 整个代码真的很长&复杂。我总结了几行。
  7. 代码:

    #read in textfile
    myFile = open('good_words.txt')
    # make a set (faster search), remove linefeeds
    lines = set(x.strip() for x in myFile)
    myFile.close()
    
    # iterate on the lines
    for word in lines:
        #only consider strings that are greater than length 3
        if len(word) >= 4:
            modifiedStr = word[1:-1][::-1] #do string modification
            if modifiedStr in lines:
                print(modifiedStr + " found (was "+word+")")
            else:
                print(modifiedStr + " not found")
    

    我在常用英语单词列表上测试了该程序,我得到了这些匹配项:

    so found (was most)
    or found (was from)
    no found (was long)
    on found (was know)
    to found (was both)
    

    编辑:另一个版本删除set并在排序列表上使用bisect以避免散列/散列冲突。

    import os,bisect
    
    #read in textfile
    myFile = open("good_words.txt"))
    lines = sorted(x.strip() for x in myFile) # make a sorted list, remove linefeeds
    myFile.close()
    
    result=[]
    for word in lines:
    
        #only modify strings that are greater than length 3
        if len(word) >= 4:
            modifiedStr = word[1:-1][::-1] #do string modification
            # search where to insert the modified word
            i=bisect.bisect_left(lines,modifiedStr)
            # if can be inserted and word is actually at this position: found
            if i<len(lines) and lines[i]==modifiedStr:
                print(modifiedStr + " found (was "+word+")")
            else:
                print(modifiedStr + " not found")
    

答案 1 :(得分:0)

在你的代码中,你不是只切换第一个和最后一个字符,而是切换第一个和最后两个字符。

rmFirstLast = str[1:len(str)-2] 

将其更改为:

rmFirstLast = str[1:len(str)-1]