下载并训练SyntaxNet后,我正在尝试编写一个可以打开新/现有文件的程序,例如AutoCAD文件,并通过分析文本将文件保存到特定目录中:打开LibreOffice文件X 。将SyntaxNet的输出视为:
echo "save AUTOCAD file X in directory Y" | ./test.sh > output.txt
Input: save AUTOCAD file X in directory Y
Parse:
save VB ROOT
+-- X NNP dobj
| +-- file NN compound
| +-- AUTOCAD CD nummod
+-- directory NN nmod
+-- in IN case
+-- Y CD nummod
首先,我考虑将解析后的文本更改为XML格式,然后使用语义分析(如SPARQL
)解析XML文件,以查找ROOT = save,dobj = X和nummode = Y并编写一个python程序,可以做同样的事情,在文中说
我不知道如果我将解析后的文本更改为XML,然后使用使用查询的语义分析,以便将ROOT
与保存dobj
的对应函数或脚本进行匹配,在nummode
我有一些想法将python连接到带有subprocess
包的终端,但我找不到任何可以帮助我保存例如来自终端的AUTOCAD文件或任何其他文件的内容或在python的帮助下,我需要编写一个脚本.sh
吗?
我对文本的句法和语义分析进行了大量研究,例如Christian Chiarcos, 2011,Hunter and Cohen, 2006和Verspoor et al., 2015,还研究了Microsoft Cortana,Sirius ,google now,但他们都没有详细说明他们如何将已解析的文字更改为执行命令,这让我得出结论,这项工作也是如此容易被谈论,但由于我不是计算机科学专业,我无法弄清楚我能做些什么。
答案 0 :(得分:10)
我是计算机科学世界和SyntaxNet的初学者。我写了一个简单的SyntaxNet-Python算法,它使用SyntaxNet来分析用户插入的文本命令,“打开我用LibreOffice writer用实验室编写器编写的文件簿”,然后用python算法分析SyntaxNet输出以便转向它是一个执行命令,在这种情况下打开一个文件,使用任何支持的格式,在Linux,Ubuntu 14.04环境中使用LibreOffice。您可以看到here LibreOffice定义的不同命令行,以便在此包中使用不同的应用程序。
安装并运行SyntaxNet(解释here中的安装过程)后,shell脚本在~/models/syntaxnet/suntaxnet/
目录和conl2tree
函数中打开demo.sh(为了从SyntaxNet获取line 54 to 56
输出而不是树格式输出,将删除tab delimited
)。
此命令在终端窗口中输入:
echo'用libreOffice writer'|打开我与实验室作家写的文件簿syntaxnet / demo.sh> output.txt的
output.txt
文档保存在demo.sh
存在的目录中,它将如下图所示:
output.txt
作为输入文件,使用下面的python算法分析SyntaxNet输出,并从LibreOffice包中识别您想要目标应用程序的文件名以及用户想要使用的命令。 #!/bin/sh
import csv
import subprocess
import sys
import os
#get SyntaxNet output as the Python algorithm input file
filename='/home/username/models/syntaxnet/work/output.txt'
#all possible executive commands for opening any file with any format with Libreoffice file
commands={
('open', 'libreoffice', 'writer'): ('libreoffice', '--writer'),
('open', 'libreoffice', 'calculator'): ('libreoffice' ,'--calc'),
('open', 'libreoffice', 'draw'): ('libreoffice' ,'--draw'),
('open', 'libreoffice', 'impress'): ('libreoffice' ,'--impress'),
('open', 'libreoffice', 'math'): ('libreoffice' ,'--math'),
('open', 'libreoffice', 'global'): ('libreoffice' ,'--global'),
('open', 'libreoffice', 'web'): ('libreoffice' ,'--web'),
('open', 'libreoffice', 'show'): ('libreoffice', '--show'),
}
#all of the possible synonyms of the application from Libreoffice
comments={
'writer': ['word','text','writer'],
'calculator': ['excel','calc','calculator'],
'draw': ['paint','draw','drawing'],
'impress': ['powerpoint','impress'],
'math': ['mathematic','calculator','math'],
'global': ['global'],
'web': ['html','web'],
'show':['presentation','show']
}
root ='ROOT' #ROOT of the senctence
noun='NOUN' #noun tagger
verb='VERB' #verb tagger
adjmod='amod' #adjective modifier
dirobj='dobj' #direct objective
apposmod='appos' # appositional modifier
prepos_obj='pobj' # prepositional objective
app='libreoffice' # name of the package
preposition='prep' # preposition
noun_modi='nn' # noun modifier
#read from Syntaxnet output tab delimited textfile
def readata(filename):
file=open(filename,'r')
lines=file.readlines()
lines=lines[:-1]
data=csv.reader(lines,delimiter='\t')
lol=list(data)
return lol
# identifies the action, the name of the file and whether the user mentioned the name of the application implicitely
def exe(root,noun,verb,adjmod,dirobj,apposmod,commands,noun_modi):
interprete='null'
lists=readata(filename)
for sublist in lists:
if sublist[7]==root and sublist[3]==verb: # when the ROOT is verb the dobj is probably the name of the file you want to have
action=sublist[1]
dep_num=sublist[0]
for sublist in lists:
if sublist[6]==dep_num and sublist[7]==dirobj:
direct_object=sublist[1]
dep_num=sublist[0]
dep_num_obj=sublist[0]
for sublist in lists:
if direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==apposmod:
direct_object=sublist[1]
elif direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==adjmod:
direct_object=sublist[1]
for sublist in lists:
if sublist[6]==dep_num_obj and sublist[7]==adjmod:
for key, v in comments.iteritems():
if sublist[1] in v:
interprete=key
for sublist in lists:
if sublist[6]==dep_num_obj and sublist[7]==noun_modi:
dep_num_nn=sublist[0]
for key, v in comments.iteritems():
if sublist[1] in v:
interprete=key
print interprete
if interprete=='null':
for sublist in lists:
if sublist[6]==dep_num_nn and sublist[7]==noun_modi:
for key, v in comments.iteritems():
if sublist[1] in v:
interprete=key
elif sublist[7]==root and sublist[3]==noun: # you have to find the word which is in a adjective form and depends on the root
dep_num=sublist[0]
dep_num_obj=sublist[0]
direct_object=sublist[1]
for sublist in lists:
if sublist[6]==dep_num and sublist[7]==adjmod:
actionis=any(t1==sublist[1] for (t1, t2, t3) in commands)
if actionis==True:
action=sublist[1]
elif sublist[6]==dep_num and sublist[7]==noun_modi:
dep_num=sublist[0]
for sublist in lists:
if sublist[6]==dep_num and sublist[7]==adjmod:
if any(t1==sublist[1] for (t1, t2, t3) in commands):
action=sublist[1]
for sublist in lists:
if direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==apposmod and sublist[1]!=action:
direct_object=sublist[1]
if direct_object=='file' and sublist[6]==dep_num_obj and sublist[7]==adjmod and sublist[1]!=action:
direct_object=sublist[1]
for sublist in lists:
if sublist[6]==dep_num_obj and sublist[7]==noun_modi:
dep_num_obj=sublist[0]
for key, v in comments.iteritems():
if sublist[1] in v:
interprete=key
else:
for sublist in lists:
if sublist[6]==dep_num_obj and sublist[7]==noun_modi:
for key, v in comments.iteritems():
if sublist[1] in v:
interprete=key
return action, direct_object, interprete
action, direct_object, interprete = exe(root,noun,verb,adjmod,dirobj,apposmod,commands,noun_modi)
# find the application (we assume we know user want to use libreoffice but we donot know what subapplication should be used)
def application(app,prepos_obj,preposition,noun_modi):
lists=readata(filename)
subapp='not mentioned'
for sublist in lists:
if sublist[1]==app:
dep_num=sublist[6]
for sublist in lists:
if sublist[0]==dep_num and sublist[7]==prepos_obj:
actioni=any(t3==sublist[1] for (t1, t2, t3) in commands)
if actioni==True:
subapp=sublist[1]
else:
for sublist in lists:
if sublist[6]==dep_num and sublist[7]==noun_modi:
actioni=any(t3==sublist[1] for (t1, t2, t3) in commands)
if actioni==True:
subapp=sublist[1]
elif sublist[0]==dep_num and sublist[7]==preposition:
sublist[6]=dep_num
for subline in lists:
if subline[0]==dep_num and subline[7]==prepos_obj:
if any(t3==sublist[1] for (t1, t2, t3) in commands):
subapp=sublist[1]
else:
for subline in lists:
if subline[0]==dep_num and subline[7]==noun_modi:
if any(t3==sublist[1] for (t1, t2, t3) in commands):
subapp=sublist[1]
return subapp
sub_application=application(app,prepos_obj,preposition,noun_modi)
if sub_application=='not mentioned' and interprete!='null':
sub_application=interprete
elif sub_application=='not mentioned' and interprete=='null':
sub_application=interprete
# the format of file
def format_function(sub_application):
subapp=sub_application
Dobj=exe(root,noun,verb,adjmod,dirobj,apposmod,commands,noun_modi)[1]
if subapp!='null':
if subapp=='writer':
a='.odt'
Dobj=Dobj+a
elif subapp=='calculator':
a='.ods'
Dobj=Dobj+a
elif subapp=='impress':
a='.odp'
Dobj=Dobj+a
elif subapp=='draw':
a='.odg'
Dobj=Dobj+a
elif subapp=='math':
a='.odf'
Dobj=Dobj+a
elif subapp=='math':
a='.odf'
Dobj=Dobj+a
elif subapp=='web':
a='.html'
Dobj=Dobj+a
else:
Dobj='null'
return Dobj
def get_filepaths(directory):
myfile=format_function(sub_application)
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
if filename==myfile:
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/home/ubuntu/")
if full_file_paths==[]:
print 'No file with name %s is found' % format_function(sub_application)
if full_file_paths!=[]:
path=full_file_paths
prompt='> '
if len(full_file_paths) >1:
print full_file_paths
print 'which %s do you mean?'% subapp
inputname=raw_input(prompt)
if inputname in full_file_paths:
path=inputname
#the main code structure
if sub_application!='null':
command= commands[action,app,sub_application]
subprocess.call([command[0],command[1],path[0]])
else:
print "The sub application is not mentioned clearly"
我再次说我是初学者,代码可能看起来不那么整洁或专业但我只是试着用我所有关于这个迷人的知识
SyntaxNet
一个实用的算法。
这个简单的算法可以打开文件:
使用LibreOffice
支持的任何格式,例如.odt,.odf,.ods,.html,.odp
。
它可以理解LibreOffice
中不同应用程序的隐式引用,例如:“用libreoffice打开文本文件”而不是“用libreoffice writer打开文件簿”
可以解决SyntaxNet解释被称为形容词的文件名称的问题。