我有以下数据:
Name Event
John EventA
Anna EventA
Dave EventA
Stew EventB
John EventB
Anna EventB
John EventC
Stew EventC
Dave EventC
我想知道谁最做同样的事情。因此,例如在上面的示例中,我希望它返回前3个最相似的对: 约翰&安娜,约翰&戴夫,约翰&炖。
我认为我需要制作一个如下所示的频率矩阵
Name John Anna Dave Stew
John 0 2 2 2
Anna 2 0 1 1
Dave 2 1 0 1
Stew 2 1 1 0
然后将其转换为类似的内容:
Pair Frequency
John Anna 2
John Dave 2
John Stew 2
Anna Dave 1
Anna Stew 1
Dave Stew 1
但我不知道该怎么做。
我正在与R合作,所以如果有人知道这样做的方法,那将是一个巨大的帮助!
答案 0 :(得分:2)
您可以使用table
melt
reshape2
个#DATA
df = structure(list(Name = c("John", "Anna", "Dave", "Stew", "John",
"Anna", "John", "Stew", "Dave"), Event = c("EventA", "EventA",
"EventA", "EventB", "EventB", "EventB", "EventC", "EventC", "EventC"
)), .Names = c("Name", "Event"), row.names = c(NA, -9L), class = "data.frame")
#Get Pairwise Frequency
a = table(df) %*% t(table(df))
a
# Name
#Name Anna Dave John Stew
# Anna 2 1 2 1
# Dave 1 2 2 1
# John 2 2 3 2
# Stew 1 1 2 2
#If you want, set diagonal elements to zero (From Karthik's comment)
#diag(a) <- 0
library(reshape2)
output = data.frame(melt(a))
colnames(output) = c("Name1", "Name2", "Value")
#Remove the pair with oneself
output = output[-(which(output$Name1 == output$Name2)),]
output
# Name1 Name2 Value
#2 Dave Anna 1
#3 John Anna 2
#4 Stew Anna 1
#5 Anna Dave 1
#7 John Dave 2
#8 Stew Dave 1
#9 Anna John 2
#10 Dave John 2
#12 Stew John 2
#13 Anna Stew 1
#14 Dave Stew 1
#15 John Stew 2
#YOU CAN PASTE 'NAME1' and 'NAME2' to a 'PAIR' if necessary
#output$PAIR = apply(output, 1, function(x) paste(sort(x[1:2]), collapse = " "))
包。
<div class="tag-box tag-box-v7 text-justify">
<div class="text-center">
<ul class="pagination" th:with="elementsperpage=2, blocksize=10, pages=${page2th.Number}/${elementsperpage}, wholepages=${format.format(pages)},
whole=(${page2th.Number}/${blocksize})+1, wholex=${format.format(whole)}, startnlockpage=${wholepages}*${blocksize+1}, endblockpage=${wholepages}*${blocksize+1},
startpage=${wholex-1}*${blocksize}, endpage=(${wholex}*${blocksize})+1">
<li>
<a th:if="${startpage gt 0}" th:href="@{${'/viewannouncements?p='}+${startpage}}"><<</a>
<a th:if="${startpage eq 0}" href="javascript:void(0);"><<</a>
</li>
<li th:each="pageNo : ${#numbers.sequence(endpage-11, (endpage lt page2th.TotalPages)? endpage-2 : page2th.TotalPages-1)}"
th:class="${page2th.Number eq pageNo}? 'active' : ''">
<a th:if="${page2th.Number eq pageNo}" href="javascript:void(0);">
<span th:text="${pageNo + 1}"></span>
</a>
<a th:if="${not (page2th.Number eq pageNo)}" th:href="@{${'/viewannouncements?p='}+${pageNo+1}}">
<span th:text="${pageNo + 1}"></span>
</a>
</li>
<li>
<a th:if="${(endpage*elementsperpage) le (page2th.TotalElements)}" th:href="@{${'/viewannouncements?p='}+${endpage}}">>></a>
<a th:if="${(endpage*elementsperpage) le (page2th.TotalElements)}" href="javascript:void(0);"></a>
</li>
</ul>
</div>
</div>
答案 1 :(得分:1)
这似乎与您要求的更接近,并且仅使用基础R中的函数。使用来自@ d.b的答案中的“df”:
x <- as.table(tcrossprod(table(df)))
x[lower.tri(x, diag = TRUE)] <- NA
na.omit(data.frame(x))
# Name Name.1 Freq
# 5 Anna Dave 1
# 9 Anna John 2
# 10 Dave John 2
# 13 Anna Stew 1
# 14 Dave Stew 1
# 15 John Stew 2
使用NA
diag
和lower.tri
可以让我们轻松删除我们不感兴趣的值。