正在通过使用Quanteda(v1.3.4)中的textplot_network()函数绘制作者句柄(@mentions)来分析Twitter数据。但是,图也显示了一些隔离的(未连接)节点。
想知道是否这是因为没有其他作者句柄(@mention)出现在此类节点之一的基础上,因此函数调用中设置了频率。还有一种方法可以将它们从频率共生矩阵(fcm)中删除,从而使它们不出现在图中。
require(data.table)
require(quanteda)
require(stringr)
not_clause_1 <- fread("final_query_output.csv", header = TRUE, skip = 0L,
stringsAsFactors = FALSE, select = c("Date","Full Text","Page
Type","Country", "Gender", "Author","Author City","Author Country","Author
State","City","County","Topics"))
setnames(not_clause_1, c("Page Type", "Full Text", "Author Country",
"Author State", "Author City"), c("Page_Type", "Full_Text",
"Author_Country", "Author_State", "Author_City"))
corpus1 <- corpus(not_clause_1$Full_Text, docnames =
paste(not_clause_1$Author, not_clause_1$idx, sep = "(")) ## using
quanteda's corpus
tokens1 <- quanteda::tokens(tolower(corpus1), what = "word", remove_numbers
= TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_separators =
TRUE, remove_url = TRUE, remove_hyphens = TRUE)
tokens1 <- tokens_remove(tokens1, stopwords("english"))
set.seed(12345)
mydfm <- dfm(tokens1, tolower = TRUE)
mentions_dfm <- dfm_select(mydfm, pattern = "@*") ## select only the author
handles beginning with @ ##
mentions_fcm <- fcm(mentions_dfm)
top_mention_names <- names(topfeatures(mentions_dfm, min(150,
length(featnames(mentions_dfm)))))
topmention_fcm <- fcm_select(mentions_fcm, pattern = top_mention_names,
selection = "keep")
size <- log(colSums(fcm_select(topmention_fcm, top_mention_names)))
textplot_network(topmention_fcm, vertex_size = size/max(size)*3, edge_color
= "blue", edge_alpha = 0.8, edge_size = 3, omit_isolated = TRUE)
如问题概述中所述,我想知道某些作者例如:@moonbow_living在不带任何连接节点的fcm中的身影,以及是否有一种方法可以在调用texplot_network( )。
添加了一个链接,用于下载具有隔离节点的基础数据和网络图:reference files