来自泡菜列表的类似字符串

时间:2014-12-04 20:48:35

标签: python search pickle

我正在寻找一种方法来检查用户的输入是否与pickle文件中的当前字符串相似。

例如,假设我有一个用户输入他们的名字......

userInput = input("What's your name? ")

...有没有办法检查它是否与pickle文件中的现有字符串类似,有点像这样:

store = pickle.load(open("save.p", "rb"))
simResult = similar(userInput, store)
for userInput not in store:
   print("This user doesn't exist in our database")
   print("How about", simResult, "?")

2 个答案:

答案 0 :(得分:0)

您可以使用difflib.SequenceMatcher

>>> from difflib import SequenceMatcher
>>> 
>>> def similar(a, b):
...     return SequenceMatcher(None, a, b).ratio()
... 
>>> similar('word','words')
0.8888888888888888
>>> similar('word','word')
1.0

>>> for i in word_list : 
...    if similar(test_word,i) > 0.7 : # 0.7 is optional         
            print test_word

答案 1 :(得分:0)

如果你想在一个pickle文件中搜索一个字符串的pickle - 让我们抛出你可能有一个巨大的“数据库”文件的用户名字符串来搜索的复杂性 - 那么它可能会非常通过加载(并因此取消)整个文件以检查文件中是否包含该字符串来尝试打开整个文件的代价很高。取消大量腌制文件的内容是很昂贵的。

因为你正在寻找一个腌制字符串中的字符串...这是一种非常简单的方法,它不需要你取消大量存储的名字文件。

# let's make a pickled file of name strings (pretend it's a huge list)
>>> import pickle
>>> names = ["John Cleese", "Graham Chapman", "Michael Palin", \
...          "Eric Idle", "Terry Gilliam", "Terry Jones", "Guido van Rossum"]
>>> f = open('names.pik', 'wb')
>>> pickle.dump(names, f, -1)
>>> f.close()
>>> 
>>> # we can unpickle the file, with load, and compare
>>> f = open('names.pik', 'rb')
>>> people = pickle.load(f)
>>> f.close()
>>> "Guido van Rossum" in people
True
>>> # for big files, this is slow.
>>> # you are searching for lists, so you can just search the file
>>> lines = open('names.pik', 'rb').read()
>>> "Guido van Rossum" in lines
True
>>> # the reason this works is that pickle dumps strings transparently
>>> pickle.dumps("Guido van Rossum")             
"S'Guido van Rossum'\np0\n."
>>> 
>>> pickle.dumps("Guido van Rossum", -1)
'\x80\x02U\x10Guido van Rossumq\x00.'

如果您正在寻找两个字符串彼此相似但不相同的情况......这不是与pickle - 和difflib有关的问题(如建议的那样) @Kasra)是个不错的选择。