Quanteda :: textplot_network()输出中的孤立节点以及如何删除/删除它们

时间:2019-01-12 08:27:03

标签: r quanteda

正在通过使用Quanteda(v1.3.4)中的textplot_network()函数绘制作者句柄(@mentions)来分析Twitter数据。但是,图也显示了一些隔离的(未连接)节点。

想知道是否这是因为没有其他作者句柄(@mention)出现在此类节点之一的基础上,因此函数调用中设置了频率。还有一种方法可以将它们从频率共生矩阵(fcm)中删除,从而使它们不出现在图中。

require(data.table)
require(quanteda)
require(stringr)

not_clause_1 <- fread("final_query_output.csv", header = TRUE, skip = 0L, 
stringsAsFactors = FALSE, select = c("Date","Full Text","Page 
Type","Country", "Gender", "Author","Author City","Author Country","Author 
State","City","County","Topics"))

setnames(not_clause_1, c("Page Type", "Full Text", "Author Country", 
"Author State", "Author City"), c("Page_Type", "Full_Text", 
"Author_Country", "Author_State", "Author_City"))

corpus1 <- corpus(not_clause_1$Full_Text, docnames = 
paste(not_clause_1$Author, not_clause_1$idx, sep = "(")) ## using 
quanteda's corpus

tokens1 <- quanteda::tokens(tolower(corpus1), what = "word", remove_numbers 
= TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_separators = 
TRUE, remove_url = TRUE, remove_hyphens = TRUE)

tokens1 <- tokens_remove(tokens1, stopwords("english"))

set.seed(12345)
mydfm <- dfm(tokens1, tolower = TRUE)

mentions_dfm <- dfm_select(mydfm, pattern = "@*") ## select only the author 
handles beginning with @ ##
mentions_fcm <- fcm(mentions_dfm)
top_mention_names <- names(topfeatures(mentions_dfm, min(150, 
length(featnames(mentions_dfm)))))
topmention_fcm <- fcm_select(mentions_fcm, pattern = top_mention_names, 
selection = "keep")

size <- log(colSums(fcm_select(topmention_fcm, top_mention_names)))

textplot_network(topmention_fcm, vertex_size = size/max(size)*3, edge_color 
= "blue", edge_alpha = 0.8, edge_size = 3, omit_isolated = TRUE)

如问题概述中所述,我想知道某些作者例如:@moonbow_living在不带任何连接节点的fcm中的身影,以及是否有一种方法可以在调用texplot_network( )。

添加了一个链接,用于下载具有隔离节点的基础数据和网络图:reference files

0 个答案:

没有答案