生成无序数据的频率表

时间:2017-09-19 03:12:08

标签: r dplyr frequency

假设数据框如下所示:

A<-c("John","John","James","Brad")
B<-c("Deb","Deb","Henry","Suzie")
C<-c("Barry","Beth","Deb","Louise")
D<-c("Ben","Dory","John","Simon")
df<-data.frame(A,B,C,D)
df
      A     B      C     D
1  John   Deb  Barry   Ben
2  John   Deb   Beth  Dory
3 James Henry    Deb  John
4  Brad Suzie Louise Simon

如何生成频率表,显示A列和A列中值组合的总次数。 B位于同一行。此输出如下所示。

       A      B     n
1   Brad  Suzie     1
2  James  Henry     1
3   John    Deb     3

我知道使用dplyr的简单频率表,但我无法在这种情况下使用它。

1 个答案:

答案 0 :(得分:0)

df<-data.frame(A = c("John","John","James","Brad"),
               B = c("Deb","Deb","Henry","Suzie"),
               C = c("Barry","Beth","Deb","Louise"),
               D = c("Ben","Dory","John","Simon"), stringsAsFactors = F)

df$seq <- paste(df$A, df$B, df$C, df$D, sep = ",")

names <- unique(c(df$A,df$B))
pairs <- combn(names, 2)
finaldf <- data.frame(name1 = NULL, name2 = NULL, count = NULL)

for(i in 1:ncol(pairs)){
  name1 <- pairs[1,i]
  name2 <- pairs[2,i]
  count <- length(which( grepl(name1,df$seq) & grepl(name2,df$seq) ))

  finaldf <- rbind(finaldf, data.frame(name1 = name1, name2 = name2, count = count))

}

finaldf

> finaldf
name1 name2 count
1   John James     1
2   John  Brad     0
3   John   Deb     3
4   John Henry     1
5   John Suzie     0
6  James  Brad     0
7  James   Deb     1
8  James Henry     1
9  James Suzie     0
10  Brad   Deb     0
11  Brad Henry     0
12  Brad Suzie     1
13   Deb Henry     1
14   Deb Suzie     0
15 Henry Suzie     0