我正在使用Python 3.2,我试图为一个句子构建一个随机生成的解析树。虽然我确定它生成了句子,但我不确定解析树的随机性如何,我也不知道是否有一种更好/更有效的方法来改进此代码。 (我是编程和Python的新手,我最近对NLP感兴趣。欢迎任何建议,解决方案或更正。)
N=['man','dog','cat','telescope','park'] #noun
P=['in','on','by','with'] #preposition
det=['a','an','the','my'] #determinant
V=['saw','ate','walked'] #verb
NP=['John','Mary','Bob'] #noun phrase
from random import choice
PP=choice(NP)+' '+choice(P) #preposition phrase
PP=''.join(PP)
VP=''.join(choice(V)+' '+choice(NP)) or''.join(choice(V)+' '.choice(NP)+(PP)) #verb phrase
VP=''.join(VP) #verb phrase
S=choice(NP)+' '+VP #sentence
print(S)
答案 0 :(得分:2)
尝试NLTK,http://nltk.org/book/ch08.html
import nltk
from random import choice, shuffle, random
# Sometimes i find reading terminals as values into a dict of POS helps.
vocab={
'Det':['a','an','the','my'],
'N':['man','dog','cat','telescope','park'],
'V':['saw','ate','walked'],
'P':['in','on','by','with'],
'NP':['John','Mary','Bob']
}
vocab2string = [pos + " -> '" + "' | '".join(vocab[pos])+"'" for pos in vocab]
# Rules are simpler to be manually crafted so i left them in strings
rules = '''
S -> NP VP
VP -> V NP
VP -> V NP PP
PP -> NP P
NP -> Det N
'''
mygrammar = rules + "\n".join(vocab2string)
grammar = nltk.parse_cfg(mygrammar) # Loaded your grammar
parser = nltk.ChartParser(grammar) # Loaded grammar into a parser
# Randomly select one terminal from each POS, based on infinite monkey theorem, i.e. selection of words without grammatical order, see https://en.wikipedia.org/wiki/Infinite_monkey_theorem
words = [choice(vocab[pos]) for pos in vocab if pos != 'P'] # without PP
words = [choice(vocab[pos]) for pos in vocab] + choice(vocab('NP')) # with a PP you need 3 NPs
# To make sure that you always generate a grammatical sentence
trees = []
while trees != []:
shuffle(words)
trees = parser.nbest_parse(words)
for t in trees:
print t