Question

我正在使用quanteda，想有条件地分配docvars()。

请考虑以下MWE：

library(dplyr)
library(quanteda)
library(quanteda.corpora)

testcorp <- corpus(data_corpus_movies))

我现在要分配一个虚拟docvar neg_sent_lg_id2，对于1为Sentiment的所有文档，{doc}应该为neg and id2是> 10000。

重要的是，我不想对语料库进行子集，但是我想将docvar分配给语料库的子集，然后保留整个语料库。 >

我已经使用docvars(testcorp, field = "neg_sent_lg_id2") <- 0为docvar分配了0，现在想做类似的事情-以下几行是伪r代码，虽然行得通，但传达了这个想法。

corpus_subset(testcorp, Sentiment == "neg") %>% # filter on "Sentiment"
    corpus_subset(testcorp, id2 > 10000) %>% # filter on "id2"
    docvars(testcorp, field = "neg_sent_lg_id2") <- 1 # selectively assign docvar

Answer 1

您可以使用ifelse：

library(dplyr)
library(quanteda)
library(quanteda.corpora)

testcorp <- corpus(data_corpus_movies)

docvars(testcorp, field = "neg_sent_lg_id2") <- 
  ifelse(docvars(testcorp, field = "Sentiment") == "neg" & docvars(testcorp, field = "id2") > 10000,
         1, 0)

这不是一个很漂亮的语法，但是可以工作：

head(docvars(testcorp))
#>                 Sentiment   id1   id2 neg_sent_lg_id2
#> neg_cv000_29416       neg cv000 29416               1
#> neg_cv001_19502       neg cv001 19502               1
#> neg_cv002_17424       neg cv002 17424               1
#> neg_cv003_12683       neg cv003 12683               1
#> neg_cv004_12641       neg cv004 12641               1
#> neg_cv005_29357       neg cv005 29357               1
table(docvars(testcorp, field = "neg_sent_lg_id2"))
#> 
#>    0    1 
#> 1005  995

^{由reprex package（v0.3.0）于2019-10-15创建}

有条件地分配docvar（）

1 个答案: