所以我已经能够根据文本开发实体图表,下面是一个样本。
X1 X2
PERSON Sherlock Holmes 1
PERSON Sir Arthur Conan Doyle 1
PERSON Sherlock Holmes 2
PERSON Watson 2
PERSON Moriarty 2
我已成功创建了一个无向图,其中包含X1列和X2列中实体之间的关系。列X2中的数字是组号。福尔摩斯和亚瑟柯南道尔爵士在同一组。理想情况下,我想在列X1中的实体和列X2中的组号之间创建和无向图,但是实体和组中的其他成员如下所示。
X1 X2
PERSON Sherlock Holmes PERSON Sherlock Holmes
PERSON Sherlock Holmes PERSON Sir Arthur Conan Doyle
PERSON Sir Arthur Conan Doyle PERSON Sir Arthur Conan Doyle
PERSON Sir Arthur Conan Doyle PERSON Sherlock Holmes
PERSON Sherlock Holmes PERSON Sherlock Holmes
PERSON Sherlock Holmes PERSON Watson
PERSON Sherlock Holmes PERSON Moriarty
PERSON Watson PERSON Watson
PERSON Watson PERSON Sherlock Holmes
PERSON Watson PERSON Moriarty
PERSON Moriarty PERSON Moriarty
PERSON Moriarty PERSON Sherlock Holmes
PERSON Moriarty PERSON Watson
能够删除图表中的重复内容以便获得下面的结果也非常好。
X1 X2
PERSON Sherlock Holmes PERSON Sir Arthur Conan Doyle
PERSON Sir Arthur Conan Doyle PERSON Sherlock Holmes
PERSON Sherlock Holmes PERSON Watson
PERSON Sherlock Holmes PERSON Moriarty
PERSON Watson PERSON Sherlock Holmes
PERSON Watson PERSON Moriarty
PERSON Moriarty PERSON Sherlock Holmes
PERSON Moriarty PERSON Watson
我使用下面的代码将文本放入带有组号的数据框中。
num.el <- sapply(entities.list, length)
association.matrix <- cbind(unlist(entities.list), rep(1:length(entities.list), num.el))
所以这是我根据Flick先生的要求得到错误的实际代码。这些数据是一封安然电子邮件。
entities.list <-
$all4
[1] " " "PERSON kaye"
$all9
[1] "MISC Content-Type : text plain; charset=us-ascii" "ORGANIZATION X-From"
"PERSON Kaye Ellis"
[4] "PERSON Lisa Mackey" "MISC X-bcc"
符合数据框列表
association.matrix <- data.frame(matrix(unlist(entities.list), byrow=T))
association.matrix
将列表符合到列表,其中同一列表项中的实体按数字
中的关联进行分组num.el <- sapply(entities.list, length)
association.matrix <- cbind(unlist(entities.list), rep(1:length(entities.list), num.el))
删除空字符串条目
association.matrix <- association.matrix[!apply(association.matrix, 1, function(x)
any(x==" ")),]
将矩阵强制转换为数据帧并删除字符串作为因子 association.matrix&lt; - data.frame(association.matrix,stringsAsFactors = FALSE)
所以数据现在看起来像这样
X1 X2
1 PERSON kaye 1
2 MISC Content-Type : text plain; charset=us-ascii 2
3 ORGANIZATION X-From 2
4 PERSON Kaye Ellis 2
5 PERSON Lisa Mackey 2
6 MISC X-bcc 2
这是Flick先生的剧本,我试图开始工作
association.matrix <- do.call(rbind, lapply(tapply(association.matrix$X1,
association.matrix$X2, combn, 2), function(x)
rbind(t(x), t(x)[,2:1])))
这是我得到的错误。
Error in FUN(X[[1L]], ...) : n < m
答案 0 :(得分:1)
因此,如果您的输入数据是
dd<- data.frame(X1 = c("PERSON Sherlock Holmes", "PERSON Sir Arthur Conan Doyle",
"PERSON Sherlock Holmes", "PERSON Watson", "PERSON Moriarty"),
X2 = c(1L, 1L, 2L, 2L, 2L), stringsAsFactors=FALSE
)
您似乎可以使用
生成所需的结果mm <- do.call(rbind, lapply(tapply(dd$X1, dd$X2, combn, 2), function(x)
rbind(t(x), t(x)[,2:1]))
)
给出了
[,1] [,2]
[1,] "PERSON Sherlock Holmes" "PERSON Sir Arthur Conan Doyle"
[2,] "PERSON Sir Arthur Conan Doyle" "PERSON Sherlock Holmes"
[3,] "PERSON Sherlock Holmes" "PERSON Watson"
[4,] "PERSON Sherlock Holmes" "PERSON Moriarty"
[5,] "PERSON Watson" "PERSON Moriarty"
[6,] "PERSON Watson" "PERSON Sherlock Holmes"
[7,] "PERSON Moriarty" "PERSON Sherlock Holmes"
[8,] "PERSON Moriarty" "PERSON Watson"
你可以使用
制作有向图library(igraph)
gg <- graph.edgelist(mm)