如何获得R中两个变量的频率计数?

时间:2017-07-14 05:24:31

标签: r

我正在寻找一种基于两个值从R数据帧中获取频率计数的方法。我尝试了一些不同的语法,而且我在R中相当新。

> table(frequency.data.frame$value,frequency.data.frame$value_x)[!is.na(frequency.data.frame$id),]
Error in `[.default`(table(frequency.data.frame$value, frequency.data.frame$value_x),  : 
  (subscript) logical subscript too long
> table(frequency.data.frame$value,frequency.data.frame$value_x[!is.na(frequency.data.frame$id),])
Error in frequency.data.frame$value_x[!is.na(frequency.data.frame$id),  : 
  incorrect number of dimensions

给出

第一维。

as.data.frame(table(frequency.data.frame[!is.na(frequency.data.frame$id),]$value))
   Var1 Freq
1     2    2
2     3    2
3     4    5
4     5   21
5     6    8
6     7   19
7     8   52
8     9   33
9    10   56
10   11    1
11   12    1

第二维。

as.data.frame(table(frequency.data.frame[!is.na(frequency.data.frame$id),]$value_x))
   Var1 Freq
1     1   50
2     2   17
3     3   12
4     4    7
5     6   18
6     8    6
7     9    1
8    10   19
9    14    1
10   15    1
11   16   11
12   17    2
13   18    2
14   96    3
15   97    4
16   98   46

数据框样本数据提取......

> frequency.data.frame
                                  id name                                                           factor value value_x
1                               <NA>                                        OSuppl=1 - Ardex | Imp_1=1 - 1     1       1
2                               <NA>                                        OSuppl=1 - Ardex | Imp_1=2 - 2     2       1
3   e7f0940c64001d4ab9d43ebd1e361292                                        OSuppl=1 - Ardex | Imp_1=3 - 3     3       1
4                               <NA>                                        OSuppl=1 - Ardex | Imp_1=4 - 4     4       1
5   2de771a03f49ce72eb721159933d4827                                        OSuppl=1 - Ardex | Imp_1=5 - 5     5       1
6   307ad612c3cc9fe5741c1fe75d1bc217                                        OSuppl=1 - Ardex | Imp_1=5 - 5     5       1
7   522f594612678f13f9dd5ee8f4f24df7                                        OSuppl=1 - Ardex | Imp_1=5 - 5     5       1
8   c1c32ac37f572fb259fe4e454bbdf743                                        OSuppl=1 - Ardex | Imp_1=5 - 5     5       1
9   d5b784d8f9508da7ac9573b535fe7147                                        OSuppl=1 - Ardex | Imp_1=5 - 5     5       1
10  e07439cdc15377d209413b31d9f80056                                        OSuppl=1 - Ardex | Imp_1=6 - 6     6       1
11  878a67dbbb428c65c83602fc112a24a0                                        OSuppl=1 - Ardex | Imp_1=6 - 6     6       1
12  5f7c27fb104685c26e53fc3267024539                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
13  6b12a3591d89f7b70587406a0c4f92bb                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
14  7fb2f98867e0e100187f0b4f13baac46                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
15  99a0ffaa2066e5c4806f2e30a446a31f                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
16  9d214544e8eaf3ea9c416a3dfbddb9f6                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
17  b36f990b1e0d8c5f04a47d23b70c1022                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
18  f2f9395bd9ddc16acd2253bd114aca64                                        OSuppl=1 - Ardex | Imp_1=7 - 7     7       1
19  4420e8499ab32631b389111935314468                                        OSuppl=1 - Ardex | Imp_1=8 - 8     8       1
...

期望的结果提取示例

   Var2 Var1 Freq
...
6     5    1 5
7     6    1 2
8     7    1 7 
9     8    1 1
...

我需要什么样的语法来获得所需的输出示例?

2 个答案:

答案 0 :(得分:1)

library(plyr)
counts <- ddply(frequency.data.frame, .(frequency.data.frame$value_x, frequency.data.frame$value), nrow)
names(counts) <- c("value_x", "value", "Freq")

      value_x value Freq
  1         1     1    1
  2         1     2    1
  3         1     3    1
  4         1     4    1
  5         1     5    5
  6         1     6    2
  7         1     7    7
  8         1     8   10
  9         1     9    9
  10        1    10   15
  11        1    11    1
  12        1    12    1
  13        2     1    1
  ...

答案 1 :(得分:1)

由于我们只获得&#39;值&#39;,&#39; value_x&#39;基于非NA&#39; id,#{1}}基于非NA元素,subset感兴趣的列,获取select并转换为{{ 1}}

table

上述解决方案的data.frame语法为

as.data.frame(table(subset(frequency.data.frame, 
             select = c('value', 'value_x'), !is.na(id))))