我正在尝试计算一个英文单词存在于可变长度字符串中的可能性;假设有10个字符。我有用于打印可变长度随机字符的代码,但是我不知道如何检查英文单词是否存在。
我不需要检查特定的单词-我需要检查在此可变长度的字符串中是否存在任何个英语单词。
我有两个问题-如何对10个字符的字符串执行此操作,或者如何对任意长度的字符串执行此操作也很有帮助。
随机字符的代码为:
web: gunicorn main:app.server
和切换器是一个字典,包含分别与A-Z配对的数字1-26。
如果我的输入为10,则该字符串可能类似于“ BFGEHDUEND”,而输出则应为字符串“ BFGEHDUEND”和True,因为该字符串包含英语单词(“ END”)。
答案 0 :(得分:0)
我想我可以为您提供一个解决方案,该解决方案不仅可以用英语,而且还可以使用其他语言(如果得到NLTK的支持)。
我们将使用NLTK来获取一组所有英语单词(已记录在here,第4.1节中),并将其分配给english
然后,我们遍历变量out
,并在所有可能的位置对其进行切片(最小长度为2个字母),并将结果附加到名为all_variants
的新列表中。
最后,我们遍历all_variants
中的“单词”,检查它们是否在变量english
中,并适当地打印响应。
# imports
import nltk
import string
import random
# getting the alphabet
alph = [x for x in string.ascii_lowercase]
# creating your dictionary
switcher = {}
for i in range(1, 27):
switcher[i] = alph[i-1]
# using nltk we are going to get a set of all english words
english = set(w.lower() for w in nltk.corpus.words.words())
def infmonktyp(english_dict = english, letter_dictionary = switcher):
out = ""
count = 0
length = int(input("How many characters do you want to print?"))
if length < 2:
raise ValueError("Length must be greater than 1")
for i in range(1, length+1):
num = random.randint(1,26)
out += letter_dictionary.get(num, "0")
# the random word has been created
print(out)
all_variants = []
# getting all variants of the word, minimum of 2 letters
for i in range(len(out)-1):
for j in range(i+2, len(out)+1):
all_variants.append(out[i:j])
# for know how many words we found, im gussing thats what you have in the second line?
words_found = 0
# looping through all the words, if they exist in english, print them, if not keep going
for word in all_variants:
if word in english_dict:
print(word, ' found in ', out)
words_found += 1
# if we didnt find any words, print that we didnt find any words
if words_found == 0:
print("Couldn't find a word")
# initialising function
infmonktyp(english, switcher)