stanford-nlp忽略句子中的文本或忽略句子标识符

时间:2015-07-03 14:44:31

标签: java vb.net stanford-nlp

我正在进行聊天分析。我预处理对话中的每个句子并将其存储在数据库中。会话具有唯一的密钥,每个句子都有自己的密钥。当我将对话加载到管道中以对其进行注释时(请参阅下面的代码,vb.net)如何在注释后跟踪句子?

我尝试在每个句子的第一部分中包含一个键(例如:_125678是句子键),但解析器将其识别为名词短语。我可以告诉解析器忽略我的数据库密钥吗?也许用一些特殊的角色包围它?任何帮助/建议将不胜感激。

' Path to the folder with models extracted from `stanford-corenlp-3.4-models.jar`
        Dim jarRoot = "stanford-corenlp-3.5.2-models"

    'when the annotation is complete I want the output to include the database keys (like 125678) so I can link this line back to the conversation
    Dim txt As String = "_125678 This movie doesn't care about cleverness, wit or any other kind of intelligent humor. Those who find ugly meanings in beautiful things are corrupt without being charming. There are slow and repetitive parts, but it has just enough spice to keep it interesting."

    ' Annotation pipeline configuration
    Dim props = New Properties()
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, sentiment")
    props.setProperty("sutime.binders", "0")
    props.setProperty("ner.useSUTime", "0")

    ' We should change current directory, so StanfordCoreNLP could find all the model files automatically 
    Dim curDir = HttpContext.Current.Server.MapPath("") ' Environment.CurrentDirectory

    Try
        Directory.SetCurrentDirectory(curDir & "\" & jarRoot)
    Catch ex As Exception
        'System.Diagnostics.Debug.WriteLine("The specified directory does not exist. {0}", ex)
    End Try

    ' Annotation
    Dim annotation = New Annotation(txt)

    Dim pipeline = New StanfordCoreNLP(props)

    Try
        pipeline.annotate(annotation)

        Directory.SetCurrentDirectory(curDir)
    Catch ex As Exception
        'System.Diagnostics.Debug.WriteLine("The specified directory does not exist. {0}", ex)
    End Try


    ' these are all the sentences in this document
    ' a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    Dim sentences = annotation.[get](GetType(CoreAnnotations.SentencesAnnotation))
    'Dim tokens = annotation.[get](GetType(CoreAnnotations.TokensAnnotation))


    Dim mainSentiment As String = String.Empty
    Dim longest As Integer = 0

    For Each sentence As Annotation In TryCast(sentences, ArrayList)
        nlp = nlp & sentence.toString
        nlp = nlp & "<br/>"

        Dim tree As Tree = sentence.[get](GetType(edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree))
        Dim sentiment As Integer = edu.stanford.nlp.neural.rnn.RNNCoreAnnotations.getPredictedClass(tree)

        'get noun phrases from tree
        'getNounPhrases(tree)

        'dependency parser, shows you the top themes in a sentence
        'Dim depparse As Tree = sentence.[get](GetType(edu.stanford.nlp.parser.nndep.DependencyParser))
        'Dim depparse_result As Integer = edu.stanford.nlp.util.CoreMap(tree)

        Dim partText As String = sentence.toString()
        If partText.Length > longest Then

            Select Case sentiment
                Case 0
                    mainSentiment = "<span class='btn btn-danger'>0. Very Negative</span>"
                Case 1
                    mainSentiment = "<span class='btn btn-warning'>1. Negative</span>"
                Case 2
                    mainSentiment = "<span class='btn btn-info'>2. Neutral</span>"
                Case 3
                    mainSentiment = "<span class='btn btn-primary'>3. Positive</span>"
                Case 4
                    mainSentiment = "<span class='btn btn-success'>4. Very Positive</span>"
            End Select

            'find the suggested main subjects for each line of text
            nlp = nlp & "(SENTIMENT):<br/>" & mainSentiment & "<br/><br/>(NOUN PHRASE/SUBJECT):<br/>"
            Dim arr As Array = Split(getNounPhrases(tree), "|", -1)
            For Each s As String In arr
                nlp = nlp & s.ToString & "<br/>"
            Next

            'NER (Named Entity Recognition) 
            'Recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities.

            Dim myNer As String = ner(sentence.toString)
            If myNer.Length > 1 Then
                nlp = nlp & "(NAMED ENTITY RECOGNITION):<br/>" & myNer
            End If


            nlp = nlp & "<hr>"

        End If

1 个答案:

答案 0 :(得分:0)

1。)在构建注释之前从string中剥离conversationID

2。)设置docID:

annotation = new Annotation(txt);
annotation.set(CoreAnnotations.DocIDAnnotation.class, "125678");

3。)注释后,句子将具有docID,并在“doc”(即对话)中按顺序编入索引

要访问会话ID:

sentence.get(CoreAnnotations.DocIDAnnotation.class);

要访问对话中的句子ID:

sentence.get(CoreAnnotations.SentenceIndexAnnotation.class);