我有一个这样的数据框:
message.id sender recipient
1 1 A B
2 1 A C
3 2 A B
4 3 B C
5 3 B D
6 3 B Q
我想通过发件人和收件人列中的值计数来总结它:
address messages.sent messages.received
1 A 3 0
2 B 3 2
3 C 0 2
4 D 0 1
5 Q 0 1
我有工作代码,但它很混乱,我希望有一种方法可以在一个magrittr
链中完成所有操作,而不是我在下面所做的:
df <- data.frame(message.id = c(1,1,2,3,3,3),
sender = c("A","A","A","B","B","B"),
recipient = c("B","C","B","C","D","Q"))
sent <- df %>%
group_by(sender) %>%
summarise(messages.sent = n()) %>%
mutate(address = sender) %>%
select(address, messages.sent)
received <- df %>%
group_by(recipient) %>%
summarise(messages.received = n()) %>%
mutate(address = recipient) %>%
select(address, messages.received)
df_summary <- merge(sent, received, all = TRUE) %>%
replace(is.na(.), 0)
答案 0 :(得分:6)
我们可以使用melt/dcast
library(reshape2)
dcast(melt(df1, id.var='message.id'), value~variable,
value.var='message.id', length)
或使用包装器recast
recast(df1, id.var='message.id', value~variable, length)
# value sender recipient
#1 A 3 0
#2 B 3 2
#3 C 0 2
#4 D 0 1
#5 Q 0 1
如果我们需要使用dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df1, messages, address, 2:3) %>%
group_by(messages, address) %>%
summarise(n=n()) %>%
spread(messages, n, fill=0)
# address sender recipient
# (chr) (dbl) (dbl)
#1 A 3 0
#2 B 3 2
#3 C 0 2
#4 D 0 1
#5 Q 0 1
答案 1 :(得分:3)
如果您正在进行某种网络分析,那么使用igraph
包
library(igraph)
g <- graph_from_data_frame(dat[c(2:3)])
data.frame(address = V(g)$name,
sent = degree(g, mode="out"),
rec = degree(g, mode="in"))
# address sent rec
# A A 3 0
# B B 3 2
# C C 0 2
# D D 0 1
# Q Q 0 1
如果你喜欢那种东西, igraph
也支持管道
此外还有一个基础R努力(我知道它不是你想要的))
lvs <- unique(unlist(dat[2:3]))
sapply(dat[2:3], function(x) table(factor(x, levels=lvs)))
答案 2 :(得分:2)
使用dplyr和tidyr,您可以执行以下操作:
library(dplyr)
library(tidyr)
df <- data.frame(message.id = c(1,1,2,3,3,3),
sender = c("A","A","A","B","B","B"),
recipient = c("B","C","B","C","D","Q"), stringsAsFactors = FALSE)
df %>% gather(sender, recipient, -message.id) %>% group_by(recipient) %>% summarise(messages.sent = sum(sender == 'sender'), messages.received = sum(sender == 'recipient'))
Source: local data frame [5 x 3]
recipient messages.sent messages.received
(chr) (int) (int)
1 A 3 0
2 B 3 2
3 C 0 2
4 D 0 1
5 Q 0 1
>
您可以将第一列名称更改为所需的名称,如下所示:
names(df)[1] <- 'address'
答案 3 :(得分:0)
使用基础R中的aggregate
和merge
的替代方案。最后,我们删除NAs并使用所需的列名重命名列。
summary <- merge(aggregate(message.id ~ sender, data = df, length),
aggregate(message.id ~ recipient, data = df, length),
by.x = "sender",
by.y = "recipient",
all = TRUE)
summary[is.na(summary)] <- 0
colnames(summary) <- c("address", "sent", "received")
summary
输出:
address sent received
1 A 3 0
2 B 3 2
3 C 0 2
4 D 0 1
5 Q 0 1