Question

我需要帮助找到解决方案以构建与r网络软件包一起使用的数据？

我有一个列表author_list，每个字符向量包含多个作者，例如：

document_authors1 = c（“国王，斯蒂芬”，“马丁，乔治”，“克兰西，汤姆”）

document_authors2 = c（“ Clancy，Tom”，“ Patterson，James”，“ Stine，R.L.”，“ King，Stephen”）

document_authors3 = c（“ Clancy，Tom”，“ Patterson，James”，“ Stine，R.L。”，“ King，Stephen”）

author_list =列表（document_authors1，document_authors2，document_authors3）

author_list

[[1]] [1]“国王斯蒂芬”“马丁马丁”“克兰姆汤姆”

[[2]] [1]“汤姆·克兰西”，“詹姆斯·帕特森”，“ R.L。Stine”。 “斯蒂芬国王”

[[3]] [1]“汤姆·克兰西”，“詹姆斯·帕特森”，“ R.L。Stine”。 “斯蒂芬国王”

我需要基于author_list创建一个数据框，其中有三列。前两列具有作者名称，其中col1具有一个作者的行值，而col2具有另一作者的行值，第三列（共现）提供了作者对（col1和col2）的出现频率，第1行）。例如，

      col1                     col2                            co-occurrence
1 King, Stephen           Patterson, James                           2
2 Martin, George             Clancy, Tom                             1

等等...

我一直在尝试从程序包中找到一个函数来执行此操作，但是没有运气。我也一直在尝试逐步解决问题，但这似乎在暗示我。希望它比我想的容易。任何建议将不胜感激。

Answer 1

我不完全确定这是您感兴趣的内容，但希望对您有所帮助。

library(dplyr)

# Only include elements in list with more than one author
author_list <- author_list[lengths(author_list)>1]

# Identify every combination of pairs of authors for each element in list
mat <- do.call(rbind, lapply(1:length(author_list), function(x) t(combn(author_list[[x]],2))))

# Within each row sort alphabetically 
mat <- t(apply(mat, 1, sort))

# Count up pairs of authors
as.data.frame(mat) %>%
  group_by_all() %>%
  summarise(count = n())

# A tibble: 8 x 3
# Groups:   V1 [3]
  V1               V2               count
  <fct>            <fct>            <int>
1 Clancy, Tom      King, Stephen        3
2 Clancy, Tom      Martin, George       1
3 Clancy, Tom      Patterson, James     2
4 Clancy, Tom      Stine, R.L.          2
5 King, Stephen    Martin, George       1
6 King, Stephen    Patterson, James     2
7 King, Stephen    Stine, R.L.          2
8 Patterson, James Stine, R.L.          2

创建显示共现的整洁数据框：使用来自不均匀字符向量列表的数据来为共现网络提供三列

1 个答案: