从CSV文件创建GATE文档

时间:2015-01-06 16:48:59

标签: perl csv gate

我需要转换一个结构如下的csv文档:

i love iphone \t positive
i hate iphone \t negative

到包含相关类的门文档:

enter image description here

最好的方法是什么? jape,groovy?

2 个答案:

答案 0 :(得分:2)

基本上你必须处理CSV和GATE文件。如果您在CPAN上搜索,您将找到可以轻松处理这些类型文档的模块。

因此,您可以使用Text :: CSV从CSV文件中获取文本,并使用NLP :: GATE :: Document模块的setTextsetAnnotationSet方法来创建,设置文本和注释a GATE文件。

尝试一下,如果您遇到任何问题,请再次询问您迄今为止尝试过的代码,以实现目标。

答案 1 :(得分:-1)

可能不是更简单的答案,但它适用于这个perl脚本:

use strict;
use locale;
use HTML::Entities;

open (IN,$ARGV[0])
    or die "file doesn't exist ! : $!\n";

my $i = 0;

while (my $form = <FICHIER>) {

    if ($form =~ /^((.+)\t(.+))$/)

    {   
        my $file = "tweet_".$i.".xml";
        # Use the open() function to create the file.
        unless(open FILE, '>'.$file) {
        # Die with error message 
        # if we can't open it.
        die "nUnable to create $file";
        }           

        my $sentence =$2;
        my $encoded_sent = encode_entities($sentence);

        my $class = $3;
        my $length_sent = length($sentence);

        ##head xml
        print FILE "<?xml version='1.0' encoding='UTF-8'?>"."\n";
        print FILE '<GateDocument version="3">'."\n";
        print FILE '<GateDocumentFeatures>'."\n";
        print FILE '<Feature>'."\n";
        print FILE '<Name className="java.lang.String">gate.SourceURL</Name>'."\n";
        print FILE '<Value className="java.lang.String">created from String</Value>'."\n";
        print FILE '</Feature>'."\n";
        print FILE '</GateDocumentFeatures>'."\n";

        ##create xml for each line  -- here is the content
        print FILE '<TextWithNodes><Node id="0"/>'.$encoded_sent.'<Node id="'.$length_sent.'"/></TextWithNodes>'."\n";

        print FILE '<AnnotationSet Name="Key">'."\n";
        print FILE '<Annotation Id="1" Type="Tweet" StartNode="0" EndNode="'.$length_sent.'">'."\n";

        print FILE '<Feature>'."\n";
        print FILE '<Name className="java.lang.String">class</Name>'."\n";
        print FILE '<Value className="java.lang.String">'.$class.'</Value>'."\n";
        print FILE '</Feature>'."\n";
        print FILE '</Annotation>'."\n";
        print FILE '</AnnotationSet>'."\n";

        ##end of the document
        print FILE '</GateDocument>'."\n";
        $i++;
    }
    close FILE;
}    
close IN;