如何在R中转换行以制成列联表?

时间:2019-05-09 15:11:31

标签: r tidyr

我的问题是我下面有数据(输入表)。但是我想要像输出表一样的表。到目前为止,我还没有找到解决方案。

插入器表: Input table

输出表:Output table

输入数据集:

set.seed(1)
Data <- data.frame(
  set = (1:10),
  Topic = sample(1:5),
  Label = sample(c("A", "B", "C"), 10, replace = TRUE),
  Score = sample(1:10)
)
Data
   set Topic Label Score
1    1     1     C     3
2    2     2     B     5
3    3     3     A    10
4    4     4     A     9
5    5     5     A     2
6    6     1     A     8
7    7     2     B     4
8    8     3     B     1
9    9     4     B     6
10  10     5     C     7

Output data:
#In the columns I want the Topic (T).

             T1    T2    T3    T4   T5
Label A       1     0     1     1    1
Label B       0     2     1     1    0
Label C       1     0     0     0    1 
Score (avg)  5.5   4.5   5.5   7.5  4.5  
Set (count)   2     2     2     2    2

我从tidyr尝试了散布函数,但是得到了很多NA值,但是没有数字。

Data_1 <- spread(Data, key = Topic, value = Label

1 个答案:

答案 0 :(得分:0)

您的问题意味着您要转置一个数据帧,即t(df),在这种情况下,您需要将其转为……

# A tibble: 7 x 4
  set   topic label score
  <chr> <dbl> <chr> <dbl>
1 X1        1 A         5
2 X2        1 A         5
3 X3        2 B        10
4 X4        2 A        10
5 X5        2 C         7
6 X6        3 A        10
7 X7        3 C        10

进入这个……

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
set   "X1" "X2" "X3" "X4" "X5" "X6" "X7"
topic "1"  "1"  "2"  "2"  "2"  "3"  "3" 
label "A"  "A"  "B"  "A"  "C"  "A"  "C" 
score " 5" " 5" "10" "10" " 7" "10" "10"

但是您的示例表表明您确实需要contingency table

# Generate a contingency table.
cont_table <- unclass(table(df$label, df$topic))

# Give the columns appropriate names.
colnames(cont_table) <- paste("Topic", colnames(cont_table))

cont_table

#### OUTPUT ####
    Topic 1 Topic 2 Topic 3
  A       2       1       1
  B       0       1       0
  C       0       1       1

要添加均值和总计,请执行以下操作:

library(dplyr)

# Get the mean for each topic.
means <- df %>% group_by(topic) %>% summarise(mean(score))


# Bind topic means and column sums to contingency table.
out_mat <- rbind(cont_table,
                "Score (avg)" = means[[2]],
                Num = colSums(cont_table)
                )

out_mat

#### OUTPUT ####

            Topic 1 Topic 2 Topic 3
A                 2       1       1
B                 0       1       0
C                 0       1       1
Score (avg)       5       9      10
Num               2       3       2

最终输出看起来与您的输出表相似,但是有一些区别。我怀疑您的输出表不正确。如果不是这种情况,请为您的原始问题添加一些说明。