我试图找到包含特定对的数据框中的组数。这是我所做的和所需输出的一个例子。
df=data.frame(c("Sam","Sam","Sam","Jason", "Jason", "Kelly", "Kelly"),
c("e","f","g","h", "h", "e", "f"))
names(df)=c('name','value')
df=df[!duplicated(df[1:2]),]
df=df[ave(rep(1, nrow(df)), df$name, FUN=length)>1,]
pairs=t(combn(unique(df$value), 2))
name value
1 Sam e
2 Sam f
3 Sam g
6 Kelly e
7 Kelly f
[,1] [,2]
[1,] e f
[2,] e g
[3,] f g
pair.1 pair.2 occurrences
1 e f 2
2 e g 1
3 f g 1
答案 0 :(得分:4)
我们merge
数据集本身由'name',sort
'value'列'row',将数据集转换为data.table
,删除具有相同'的行'值'元素,按'值'列分组,得到nrow(.N
)并除以2.
d1 <- merge(df, df, by.x='name', by.y='name')
d1[-1] <- t(apply(d1[-1], 1, sort))
library(data.table)
setDT(d1)[value.x!=value.y][,.N/2 ,.(value.x, value.y)]
# value.x value.y V1
#1: e f 2
#2: e g 1
#3: f g 1
或使用与@ jeremycg的帖子中类似的方法
setDT(df)[df, on='name', allow.cartesian=TRUE
][as.character(value)< as.character(i.value), .N, .(value, i.value)]
答案 1 :(得分:3)
以下是使用dplyr
的答案。请参阅内联评论以获得解释:
library(dplyr) #load dplyr
df %>% #your data
left_join(df, by = "name") %>% #merge against your own data
filter(as.character(value.x) < as.character(value.y)) %>% #filter out any where the two are equal, and make sure we only have one of each pair
group_by(value.x, value.y) %>% #group by the two vars
summarise(n()) #count them
Source: local data frame [3 x 3]
Groups: value.x [?]
value.x value.y n()
(fctr) (fctr) (int)
1 e f 2
2 e g 1
3 f g 1