如何将kwic的输出转换为语料库以进行进一步分析? 更具体地说,我想基于关键字(contextPre,contextPost)之前和之后的单词创建一个语料库,以对它们进行进一步的情感分析。
答案 0 :(得分:0)
最简单的方法:创建一个预上下文和后上下文语料库,使用标识上下文的文档变量(docvar
),然后将两个语料库与+
操作合并。
require(quanteda)
mykwic <- kwic(data_corpus_inaugural, "terror")
# make a corpus with the pre-word context
mycorpus <- corpus(mykwic$pre)
docvars(mycorpus, "context") <- "pre"
# make a corpus with the post-word context
mycorpus2 <- corpus(mykwic$post)
docvars(mycorpus2, "context") <- "post"
# combine the two corpora
mycorpus <- mycorpus + mycorpus2
summary(mycorpus)
# Corpus consisting of 16 documents.
#
# Text Types Tokens Sentences context
# text1 5 5 1 pre
# text2 4 5 1 pre
# text3 5 5 1 pre
# text4 5 5 1 pre
# text5 5 5 1 pre
# text6 5 5 1 pre
# text7 5 5 1 pre
# text8 5 5 1 pre
# text11 4 5 1 post
# text21 5 5 1 post
# text31 5 5 1 post
# text41 5 5 1 post
# text51 5 5 1 post
# text61 5 5 2 post
# text71 5 5 2 post
# text81 5 5 1 post
#
# Source: Combination of corpuses mycorpus and mycorpus2
# Created: Wed May 25 23:35:54 2016
# Notes:
已添加:
从v0.9.7-6开始, quanteda 有一种方法可以直接从corpus
对象构造kwic
。所以这现在有效:
mykwic <- kwic(data_corpus_inaugural, "southern")
summary(corpus(mykwic))
# Corpus consisting of 28 documents.
#
# Text Types Tokens Sentences docname position keyword context
# text1.pre 5 5 1 1797-Adams 1807 southern pre
# text2.pre 4 5 1 1825-Adams 2434 southern pre
# text3.pre 4 5 1 1861-Lincoln 98 Southern pre
# text4.pre 5 5 1 1865-Lincoln 283 southern pre
# text5.pre 5 5 1 1877-Hayes 378 Southern pre
# text6.pre 5 5 1 1877-Hayes 956 Southern pre
# text7.pre 5 5 1 1877-Hayes 1250 Southern pre
# text8.pre 5 5 1 1881-Garfield 1007 Southern pre
# text9.pre 4 5 1 1909-Taft 4029 Southern pre
# text10.pre 5 5 1 1909-Taft 4230 Southern pre
# text11.pre 5 5 1 1909-Taft 4350 Southern pre
# text12.pre 5 5 1 1909-Taft 4537 Southern pre
# text13.pre 5 5 1 1909-Taft 4597 Southern pre
# text14.pre 5 5 1 1953-Eisenhower 1226 southern pre
# text1.post 5 5 1 1797-Adams 1807 southern post
# text2.post 5 5 1 1825-Adams 2434 southern post
# text3.post 5 5 1 1861-Lincoln 98 Southern post
# text4.post 5 5 2 1865-Lincoln 283 southern post
# text5.post 5 5 2 1877-Hayes 378 Southern post
# text6.post 5 5 1 1877-Hayes 956 Southern post
# text7.post 5 5 1 1877-Hayes 1250 Southern post
# text8.post 5 5 2 1881-Garfield 1007 Southern post
# text9.post 5 5 2 1909-Taft 4029 Southern post
# text10.post 5 5 1 1909-Taft 4230 Southern post
# text11.post 5 5 1 1909-Taft 4350 Southern post
# text12.post 5 5 1 1909-Taft 4537 Southern post
# text13.post 5 5 1 1909-Taft 4597 Southern post
# text14.post 5 5 1 1953-Eisenhower 1226 southern post
#
# Source: Corpus created from kwic(x, keywords = "southern")
# Created: Thu May 26 09:47:19 2016
# Notes: