我想从给定的句子中生成与该给定句子相同的新句子。
from nltk.corpus import wordnet
from nltk.tokenize import word_tokenize
from random import randint
from nltk.corpus import stopwords
import nltk.data
# Load a text file if required
text = "How many holidays do I have?"
output = ""
Lisst=[]
a=10
while (len(Lisst)<a):
output = ""
# Load the pretrained neural net
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
# Tokenize the text
tokenized = tokenizer.tokenize(text)
#print('tokenized',tokenized,'\n')
# Get the list of words from the entire text
words = word_tokenize(text)
#print('words',words,'\n')
stop_words = set(stopwords.words('english'))
#print('stop_words',stop_words,'\n')
# Identify the parts of speech
tagged = nltk.pos_tag(words)
for i in range(0,len(words)):
replacements = []
# Only replace nouns with nouns, vowels with vowels etc.
for syn in wordnet.synsets(words[i]):
#print(syn)
# Do not attempt to replace proper nouns or determiners
if tagged[i][1] == 'NNP' or tagged[i][1] == 'DT':
#print('tag',tagged[i][1])
break
# The tokenizer returns strings like NNP, VBP etc
# but the wordnet synonyms has tags like .n.
# So we extract the first character from NNP ie n
# then we check if the dictionary word has a .n. or not
word_type = tagged[i][1][0].lower()
#print('type',word_type)
if syn.name().find("."+word_type+"."):
# extract the word only
#print('synname',syn.name().find("."+word_type+"."))
r = syn.name()[0:syn.name().find(".")]
#print(r)
replacements.append(r)
#print(replacements,'\n')
if len(replacements) > 0:
# Choose a random replacement
replacement = replacements[randint(0,len(replacements)-1)]
output = output + " " + replacement
#print(output)
else:
# If no replacement could be found, then just use the
# original word
output = output + " " + words[i]
#print('\n',output)
#print('Input:',text)
Lisst.append(output)
#print('Output:',output)
Lisst1=set(Lisst)
Lisst=list(Lisst1)
#print('Outputs',Lisst)
print('Input:',text)
print('Outputs',Lisst)
浏览一段时间后,我找到了这段代码,并对其进行了少量更改。我得到的输出如下:
Input: How many holidays do I have?
Outputs [' How many vacation bash iodine have ?', ' How many vacation dress one have ?', ' How many vacation dress i give_birth ?', ' How many vacation suffice i have ?', ' How many holiday do one have ?', ' How many holiday doctor_of_osteopathy i have ?', ' How many vacation do i have ?', ' How many vacation suffice one own ?', ' How many vacation do iodine rich_person ?', ' How many vacation cause i induce ?']
它将更改除DT判定符和NNP专有名词(单数)以外的所有单词的同义词。产生的大多数输出的意义都较小。我希望输出表示已满。 如果我能产生一个意思相同的全新句子,那就太好了。例如:
Input: How many holidays do I have?
Outputs['Number of leaves do I have','Leaves that I can avail'.......,'']
也推荐链接。
在此先感谢:-)。