建立列联表

时间:2017-08-29 11:58:10

标签: r contingency

我有一张这样的表:

df <- data.frame(P1 = c(1,0,0,0,0,0,"A"),
                  P2 = c(0,-2,1,2,1,0,"A"),
                  P3 = c(-1,2,0,2,1,0,"B"),
                  P4 = c(2,0,-1,0,-1,0,"B"),
                  Names = c("G1","G2","G3","G1","G2","G3","Group"),
                  stringsAsFactors = FALSE)

哪个成为

Names    P1   P2    P3   P4
G1       1    0     -1   2
G2       0    -2    2    0
G3       0    1     0    -1
G1       0    2     2    0
G2       0    1     1    -1
G3       0    0     0    0
Group    A    A     B    B

此处,ABP1, P2, P3, P4的分组变量。

我想为IdsG1G2 ...),GroupAB)建立意外事件,以及Var-2,-1,0,1,2)表,例如:

Id    Group Var    Count
G1    A     -2     0
G1    A     -1     0
G1    A     0      1
G1    A     1      1
G1    A     2      0
G1    B     -2     0
G1    B     -1     1
G1    B     0      0
G1    B     1      0
G1    B     2      1
G2    A     -2     1
G2    A     -1     0
G2    A     0      1
...

有没有办法在没有使用大量循环的情况下在R中执行此操作?

2 个答案:

答案 0 :(得分:1)

假设您要对P1&amp; P2列为AP3&amp; P4列为B,您可以使用data.table - 包来接近它:

library(data.table)
DT <- melt(melt(setDT(df),
                measure.vars = list(c(2,3),c(4,5)),
                value.name = c("A","B")),
           id = 1, measure.vars = 3:4, variable.name = 'group'
           )[order(Id,group)][, val2 := value]

DT[CJ(Id = Id, group = group, value = value, unique = TRUE)
   , on = .(Id, group, value)
   ][, .(counts = sum(!is.na(val2))), by = .(Id, group, value)]

导致:

    Id group value counts
 1: G1     A    -2      0
 2: G1     A    -1      0
 3: G1     A     0      2
 4: G1     A     1      1
 5: G1     A     2      1
 6: G1     B    -2      0
 7: G1     B    -1      1
 8: G1     B     0      1
 9: G1     B     1      0
10: G1     B     2      2
11: G2     A    -2      1
12: G2     A    -1      0
13: G2     A     0      2
14: G2     A     1      1
15: G2     A     2      0
16: G2     B    -2      0
17: G2     B    -1      1
18: G2     B     0      1
19: G2     B     1      1
20: G2     B     2      1
21: G3     A    -2      0
22: G3     A    -1      0
23: G3     A     0      3
24: G3     A     1      1
25: G3     A     2      0
26: G3     B    -2      0
27: G3     B    -1      1
28: G3     B     0      3
29: G3     B     1      0
30: G3     B     2      0

使用过的数据:

df <- read.table(text="Id       P1   P2   P3    P4   
G1     1    0    -1    2 
G2     0    -2   2     0 
G3     0    1    0     -1
G1     0    2    2     0 
G2     0    1    1     -1 
G3     0    0    0     0", header=TRUE, stringsAsFactors = FALSE)

请注意,我省略了&#39;组&#39; -row,因为您在评论中说明这些只是为P1 - P4列指出哪些组属于

答案 1 :(得分:1)

使用

library(tidyverse)

df <- read.table(text="Id       P1   P2   P3    P4   
G1     1    0    -1    2 
G2     0    -2   2     0 
G3     0    1    0     -1
G1     0    2    2     0 
G2     0    1    1     -1 
G3     0    0    0     0", header=TRUE, stringsAsFactors = FALSE)

我们重新整理表格并重新编码P*中的group个变量。 然后,我们计算并完成丢失的案例。导致:

df %>%
  gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
  mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
  group_by(Id, group, v) %>% 
  summarise(Count = n()) %>% 
  ungroup() %>% 
  complete(Id, group, v, fill = list("Count" = 0)) 

如果你不需要输出中的所有组合,只需使用:

df %>%
  gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
  mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
  group_by(Id, group, v) %>% 
  summarise(Count = n())

# A tibble: 17 x 4
# Groups:   Id, group [?]
      Id    group  v     Count
      <chr> <chr>  <int> <int>
 1    G1     A     0     2
 2    G1     A     1     1
 3    G1     A     2     1
 4    G1     B    -1     1
 5    G1     B     0     1
 6    G1     B     2     2
 7    G2     A    -2     1
 8    G2     A     0     2
 9    G2     A     1     1
10    G2     B    -1     1
11    G2     B     0     1
12    G2     B     1     1
13    G2     B     2     1
14    G3     A     0     3
15    G3     A     1     1
16    G3     B    -1     1
17    G3     B     0     3