我的问题是我下面有数据(输入表)。但是我想要像输出表一样的表。到目前为止,我还没有找到解决方案。
插入器表: Input table
输出表:Output table
输入数据集:
set.seed(1)
Data <- data.frame(
set = (1:10),
Topic = sample(1:5),
Label = sample(c("A", "B", "C"), 10, replace = TRUE),
Score = sample(1:10)
)
Data
set Topic Label Score
1 1 1 C 3
2 2 2 B 5
3 3 3 A 10
4 4 4 A 9
5 5 5 A 2
6 6 1 A 8
7 7 2 B 4
8 8 3 B 1
9 9 4 B 6
10 10 5 C 7
Output data:
#In the columns I want the Topic (T).
T1 T2 T3 T4 T5
Label A 1 0 1 1 1
Label B 0 2 1 1 0
Label C 1 0 0 0 1
Score (avg) 5.5 4.5 5.5 7.5 4.5
Set (count) 2 2 2 2 2
我从tidyr尝试了散布函数,但是得到了很多NA值,但是没有数字。
Data_1 <- spread(Data, key = Topic, value = Label
答案 0 :(得分:0)
您的问题意味着您要转置一个数据帧,即t(df)
,在这种情况下,您需要将其转为……
# A tibble: 7 x 4
set topic label score
<chr> <dbl> <chr> <dbl>
1 X1 1 A 5
2 X2 1 A 5
3 X3 2 B 10
4 X4 2 A 10
5 X5 2 C 7
6 X6 3 A 10
7 X7 3 C 10
进入这个……
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
set "X1" "X2" "X3" "X4" "X5" "X6" "X7"
topic "1" "1" "2" "2" "2" "3" "3"
label "A" "A" "B" "A" "C" "A" "C"
score " 5" " 5" "10" "10" " 7" "10" "10"
但是您的示例表表明您确实需要contingency table:
# Generate a contingency table.
cont_table <- unclass(table(df$label, df$topic))
# Give the columns appropriate names.
colnames(cont_table) <- paste("Topic", colnames(cont_table))
cont_table
#### OUTPUT ####
Topic 1 Topic 2 Topic 3
A 2 1 1
B 0 1 0
C 0 1 1
要添加均值和总计,请执行以下操作:
library(dplyr)
# Get the mean for each topic.
means <- df %>% group_by(topic) %>% summarise(mean(score))
# Bind topic means and column sums to contingency table.
out_mat <- rbind(cont_table,
"Score (avg)" = means[[2]],
Num = colSums(cont_table)
)
out_mat
#### OUTPUT ####
Topic 1 Topic 2 Topic 3
A 2 1 1
B 0 1 0
C 0 1 1
Score (avg) 5 9 10
Num 2 3 2
最终输出看起来与您的输出表相似,但是有一些区别。我怀疑您的输出表不正确。如果不是这种情况,请为您的原始问题添加一些说明。