Question

我的数据格式如下：

-------------------------------------------------------------------------------
Author_ID   Year    CoAuthor_Count  High    Medium  Low     Deviant Paper_Count
-------------------------------------------------------------------------------
677         2005    1               1.00    0.00    0.00    0.00    3
677         2007    3               0.66    0.00    0.33    0.00    1
677         2009    1               0.00    1.00    0.00    0.00    1
677         2011    5               0.60    0.00    0.40    0.00    1
677         2012    2               1.00    0.00    0.00    0.00    1
677         2013    5               0.60    0.40    0.00    0.00    2
1359        2005    11              0.00    0.00    0.81    0.18    11
1359        2006    27              0.00    0.14    0.70    0.14    20
1359        2007    29              0.00    0.06    0.62    0.31    12
1359        2008    29              0.00    0.10    0.55    0.34    13
1359        2009    28              0.00    0.32    0.53    0.14    18
1359        2010    22              0.04    0.18    0.59    0.18    14  
...  
...  
...

而High，Medium，Low和Deviant列代表Author和CoAuthor之间的相似度值。在同一表格中，我还有关于Author和Venue相似性和计数的数据。

我已使用Microsoft Clustering对这些数据进行聚类，但是成功为每行分配了一个群集标签。

但问题是我想计算这些数据的聚类系数，而数据应该是图形形式（节点，边缘）来计算聚类系数。

如何计算这些数据的聚类系数？

Answer 1

MS Clustering不会为您提供任何计算作者的（本地）聚类系数的公式。

相反，Microsoft Clustering为您提供了（根据文档）两种算法，k-means Clustering和EM Clustering（与k-Means相关，它是more general）。从广义上讲，这些是整合数据集的方法。

＆＃34;聚类系数＆＃34;你可能正在寻找更多的作者关系网络的属性。

这是一个不幸的命名案例。不同概念有一个名称/属性：

＆＃34;聚类算法＆＃34;，无监督机器学习方法
＆＃34;聚类系数＆＃34;，图论的度量

local clustering coefficient可按如下方式计算

for each author 
  create a list of  coauthor-ids of this author (this column is missing in your table)
  for all coauthor-ids from that list, 
     count/sum the the unique mutual coauthorship-pairs between them, but not with the author himself 

  Divide this by the number of coauthors per author (you already have this one, CoAuthor_Count)

请参阅上面链接的维基百科页面右侧的插图。

我还没有找到这样做的Excel插件或VBA molules或Add-Ins。

计算聚类系数

1 个答案: