在不改变给定句子含义的情况下,从给定句子生成新句子

时间:2019-11-22 09:55:25

标签: python python-3.x tensorflow nlp spacy

我想从给定的句子中生成与该给定句子相同的新句子。

from nltk.corpus import wordnet
from nltk.tokenize import word_tokenize
from random import randint
from nltk.corpus import stopwords 
import nltk.data

# Load a text file if required
text = "How many holidays do I have?" 
output = ""
Lisst=[]
a=10
while (len(Lisst)<a):
    output = ""
    # Load the pretrained neural net
    tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

    # Tokenize the text
    tokenized = tokenizer.tokenize(text)
    #print('tokenized',tokenized,'\n')

    # Get the list of words from the entire text
    words = word_tokenize(text)
    #print('words',words,'\n')
    stop_words = set(stopwords.words('english'))
    #print('stop_words',stop_words,'\n')

    # Identify the parts of speech
    tagged = nltk.pos_tag(words)

    for i in range(0,len(words)):
        replacements = []


        # Only replace nouns with nouns, vowels with vowels etc.
        for syn in wordnet.synsets(words[i]):
            #print(syn)
            # Do not attempt to replace proper nouns or determiners
            if tagged[i][1] == 'NNP' or tagged[i][1] == 'DT':
                #print('tag',tagged[i][1])
                break

            # The tokenizer returns strings like NNP, VBP etc
            # but the wordnet synonyms has tags like .n.
            # So we extract the first character from NNP ie n
            # then we check if the dictionary word has a .n. or not 
            word_type = tagged[i][1][0].lower()
            #print('type',word_type)
            if syn.name().find("."+word_type+"."):
                # extract the word only
                #print('synname',syn.name().find("."+word_type+"."))
                r = syn.name()[0:syn.name().find(".")]
                #print(r)
                replacements.append(r)
                #print(replacements,'\n')

        if len(replacements) > 0:
            # Choose a random replacement
            replacement = replacements[randint(0,len(replacements)-1)]
            output = output + " " + replacement
            #print(output)
        else:
            # If no replacement could be found, then just use the
            # original word
            output = output + " " + words[i]
            #print('\n',output)

    #print('Input:',text)
    Lisst.append(output)
    #print('Output:',output)
    Lisst1=set(Lisst)
    Lisst=list(Lisst1)
    #print('Outputs',Lisst)
print('Input:',text)
print('Outputs',Lisst)

浏览一段时间后,我找到了这段代码,并对其进行了少量更改。我得到的输出如下:

Input: How many holidays do I have?
Outputs [' How many vacation bash iodine have ?', ' How many vacation dress one have ?', ' How many vacation dress i give_birth ?', ' How many vacation suffice i have ?', ' How many holiday do one have ?', ' How many holiday doctor_of_osteopathy i have ?', ' How many vacation do i have ?', ' How many vacation suffice one own ?', ' How many vacation do iodine rich_person ?', ' How many vacation cause i induce ?']

它将更改除DT判定符和NNP专有名词(单数)以外的所有单词的同义词。产生的大多数输出​​的意义都较小。我希望输出表示已满。 如果我能产生一个意思相同的全新句子,那就太好了。例如:

Input: How many holidays do I have?
Outputs['Number of leaves do I have','Leaves that I can avail'.......,'']

也推荐链接。

在此先感谢:-)。

0 个答案:

没有答案