R Dataframe中的交叉制表

时间:2018-05-27 13:20:56

标签: r statistics

我在R中有一个数据框:

Subject  T  O  E  P  Score
1        0  1  0  1   256
2        1  0  1  0   325 
2        0  1  0  1   125
3        0  1  0  1   27
4        0  0  0  1   87
5        0  1  0  1   125
6        0  1  1  1   100

这只是数据帧的显示。实际上,每个主题都有很多行。但受试者只有1到6个

对于每个主题,可能的值为:

  • T:0或1

  • O:0或1

  • E:0或1

  • P:0或1

  • 得分:数值

我想创建一个包含6行(每个主题一个)的新数据帧,并计算每个组合的MEAN分数:

T,O,E,P,TO,TE,TP,OE,OP,PE,TOP,TOE,POE,PET

以上是新数据帧的列。

最终输出应如下所示

Subject  T    O   E   P   TO  TE  TP   OE   OP  PE  TOP  TOE  POE  PET
1       
2
3
4
5
6

对于这些行中的每一行x列,值为MEAN SCORE

我试过aggregatetable,但我似乎无法得到我想要的东西

对不起,我是R的新手

由于

2 个答案:

答案 0 :(得分:2)

我不得不重建样本数据来回答我理解的问题,告诉我它是否适合你:

set.seed(2)
df <- data.frame(subject=sample(1:3,9,T),
                 T = sample(c(0,1),9,T),
                 O = sample(c(0,1),9,T),
                 E = sample(c(0,1),9,T),
                 P = sample(c(0,1),9,T),
                 score=round(rnorm(9,10,3)))

#   subject T O E P score
# 1       1 1 0 0 1    12
# 2       3 1 0 1 0     9
# 3       2 0 1 0 1    13
# 4       1 1 0 0 0     3
# 5       3 0 1 0 1    14
# 6       3 0 0 1 0    13
# 7       1 1 0 1 0    17
# 8       3 1 0 1 0    12
# 9       2 0 0 1 1    14

cols1 <- c("T","O","E","P")
df$comb <- apply(df[cols1],1,function(x) paste(names(df[cols1])[as.logical(x)],collapse=""))

#   subject T O E P score comb
# 1       1 1 0 0 1    12   TP
# 2       3 1 0 1 0     9   TE
# 3       2 0 1 0 1    13   OP
# 4       1 1 0 0 0     3    T
# 5       3 0 1 0 1    14   OP
# 6       3 0 0 1 0    13    E
# 7       1 1 0 1 0    17   TE
# 8       3 1 0 1 0    12   TE
# 9       2 0 0 1 1    14   EP

library(tidyverse)

df %>%
  group_by(subject,comb) %>%
  summarize(score=mean(score)) %>%
  spread(comb,score) %>%
  ungroup

# # A tibble: 3 x 7
#   subject     E    EP    OP     T    TE    TP
# *   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1       1    NA    NA    NA     3  17.0    12
# 2       2    NA    14    13    NA    NA    NA
# 3       3    13    NA    14    NA  10.5    NA

基地R的第二步:

means <- aggregate(score ~ subject + comb,df,mean)
means2 <- reshape(means,timevar="comb",idvar="subject",direction="wide")
setNames(means2,c("subject",sort(unique(df$comb))))
#   subject  E EP OP  T   TE TP
# 1       3 13 NA 14 NA 10.5 NA
# 2       2 NA 14 13 NA   NA NA
# 5       1 NA NA NA  3 17.0 12

答案 1 :(得分:0)

我会这样做:

# using your table data
df = read.table(text = 
"Subject  T  O  E  P  Score
1        0  1  0  1   256
2        1  0  1  0   325 
2        0  1  0  1   125
3        0  1  0  1   27
4        0  0  0  1   87
5        0  1  0  1   125
6        0  1  1  1   100", stringsAsFactors = FALSE, header=TRUE)

# your desired column names
new_names <- c("T", "O", "E", "P", "TO", "TE", "TP", "OE",
               "OP", "PE", "TOP", "TOE", "POE", "PET")

# assigning each of your scores to one of the desired column names
assign_comb <- function(dfrow) {
  selection <- c("T", "O", "E", "P")[as.logical(dfrow[2:5])]
  do.call(paste, as.list(c(selection, sep = "")))
}
df$comb <- apply(df, 1, assign_comb)

# aggregate all the means together
df_agg <- aggregate(df$Score ~ df$comb + df$Subject, FUN = mean)

# reshape the data to wide format
df_new <- reshape(df_agg, v.names = "df$Score", idvar = "df$Subject", 
                  timevar = "df$comb", direction = "wide")

# clean up the column names to match your desired output
# any column names not found will be added as NA
colnames(df_new) <- gsub("df\\$|Score\\.", "", colnames(df_new))
df_new[, new_names[!new_names %in% colnames(df_new)]] <- NA
df_new <- df_new[, c("Subject", new_names)]

结果:

> df_new
  Subject  T  O  E  P TO  TE TP OE  OP PE TOP TOE POE PET
1       1 NA NA NA NA NA  NA NA NA 256 NA  NA  NA  NA  NA
2       2 NA NA NA NA NA 325 NA NA 125 NA  NA  NA  NA  NA
4       3 NA NA NA NA NA  NA NA NA  27 NA  NA  NA  NA  NA
5       4 NA NA NA 87 NA  NA NA NA  NA NA  NA  NA  NA  NA
6       5 NA NA NA NA NA  NA NA NA 125 NA  NA  NA  NA  NA
7       6 NA NA NA NA NA  NA NA NA  NA NA  NA  NA  NA  NA