我有这种格式的数据框:
doc_id doc_uni
1: S0100-879X1998000800006 University of Gorakhpur
2: S0100-879X1998000800011 Universidade Estadual de Londrina
3: S0100-879X1998000800005 Ivano-Frankivsk State Medical Academy
4: S0100-879X1998000800005 Southern Sea Biology Institute
5: S0100-879X1998000800005 Carleton University
其中doc_id
代表发表的论文,doc_uni
代表作者所在的大学。我的目标是将这些数据分析为社交网络。
从这个小样本中,我将创建此图形对象:
uni1 | uni2
University of Gorakhpur | NA
Universidade Estadual de Londrina | NA
Ivano-Frankivsk State Medical Academy | Southern Sea Biology Institute
Ivano-Frankivsk State Medical Academy | Carleton University
Southern Sea Biology Institute | Carleton University
我尝试编写for循环来完成此任务,但我的数据集有602k行,因此效率非常低。