在R中组合和转换数据帧

时间:2017-04-30 22:35:57

标签: r csv dataframe data-processing

我在R中有一堆看起来像这样的数据框:

print(output[2])
Button Intensity Acc Intensity RT    Time tdelta SubjectID CoupleID PrePost
 1:      0        30   0       0.0  0 83325.87  0.000      1531 153    Post
 2:      1        30   1      13.5  0 83362.65 36.782      1531 153    Post
 3:      1        30   1      15.0  0 83376.68 14.027      1531 153    Post
 4:      1        30   1       6.0  0 83392.27 15.585      1531 153    Post
 5:      1        30   1      15.0  0 83398.77  6.507      1531 153    Post

 print(output[1])
 [[1]]
     Button Intensity Acc Intensity RT     Time tdelta SubjectID CoupleID PrePost
  1:      0        30   0       0.0  0 77987.93  0.000      1531 153 Pre
  2:      1        30   1      13.5  0 78084.57 96.639      1531 153 Pre
  3:      1        30   1      15.0  0 78098.62 14.054      1531 153 Pre
  4:      1        30   1       6.0  0 78114.13 15.508      1531 153 Pre
  5:      1        30   1      15.0  0 78120.67  6.537      1531 153 Pre

我想将它们组合成一个具有以下逻辑和格式的大数据框架:

SubjectID  CoupleID  PrePost  Miss1RT   Miss2RT Miss3RT Hit1RT   Hit2RT  Hit3RT
1531    153          Post     0.00       NA     NA      NA     36.78    14.027
1531    153          Pre      0.00       NA     NA      NA     96.638   14.054

如果Button == 0,那么它是一个Miss,如果它== 1,那么它就是一个Hit。所以,它应该是这样的:

for row in output[i].rows:
   if Button ==0:
      Miss1RT ==tdelta
   elif Button ==1;
      Miss1RT =='NA'

然后翻转版本,如果Button为1,则[i] RT为tdelta或“NA”。

每个数据框有26行,每行是命中或未命中,因此将有26个Miss和26个Hit列,每个SubjectID有两行 - 一个用于Pre,一个用于Post。因此,最终输出的列标题将为:

SubjectID  CoupleID  PrePost  Miss1RT   Miss2RT ...Miss26RT  Hit1RT  Hit2RT ... Hit26RT

我是R的新手并且正在使用正确的语法。

1 个答案:

答案 0 :(得分:1)

这样的事情应该有效:

#Get data in structure OP has
output <- list(pre, post)
output2 <- lapply(output, function(x) cbind(x, num = paste0(1:nrow(x), "RT")))
pre_post <- do.call("rbind", output2)

#Perform actual calculations
pre_post$miss <- ifelse(pre_post$Button == 0, pre_post$tdelta, NA)
pre_post$hit <- ifelse(pre_post$Button == 1, pre_post$tdelta, NA)

pre_post_melted <- melt(pre_post, id.vars = c("SubjectID", "CoupleID", "num", "PrePost"), measure.vars = c("hit","miss"))
pre_post_res <- dcast(pre_post_melted, SubjectID + CoupleID + PrePost ~ variable + num, sep = "")

pre_post_res

  #SubjectID CoupleID PrePost hit_1RT hit_2RT hit_3RT hit_4RT hit_5RT miss_1RT miss_2RT miss_3RT miss_4RT miss_5RT
#1      1531      153    Post      NA  36.782  14.027  15.585   6.507        0       NA       NA       NA       NA
#2      1531      153     Pre      NA  96.639  14.054  15.508   6.537        0       NA       NA       NA       NA

我们转置数据以动态创建我们想要的所有变量。我们还堆叠数据以避免重复步骤。

数据:

pre <- structure(list(Button = c(0L, 1L, 1L, 1L, 1L), Intensity = c(30L, 
30L, 30L, 30L, 30L), Acc = c(0L, 1L, 1L, 1L, 1L), Intensity = c(0, 
13.5, 15, 6, 15), RT = c(0L, 0L, 0L, 0L, 0L), Time = c(77987.93, 
78084.57, 78098.62, 78114.13, 78120.67), tdelta = c(0, 96.639, 
14.054, 15.508, 6.537), SubjectID = c(1531L, 1531L, 1531L, 1531L, 
1531L), CoupleID = c(153L, 153L, 153L, 153L, 153L), PrePost = c("Pre", 
"Pre", "Pre", "Pre", "Pre")), .Names = c("Button", "Intensity", 
"Acc", "Intensity", "RT", "Time", "tdelta", "SubjectID", "CoupleID", 
"PrePost"), row.names = c(NA, -5L), class = "data.frame")

post <- structure(list(Button = c(0L, 1L, 1L, 1L, 1L), Intensity = c(30L, 
30L, 30L, 30L, 30L), Acc = c(0L, 1L, 1L, 1L, 1L), Intensity = c(0, 
13.5, 15, 6, 15), RT = c(0L, 0L, 0L, 0L, 0L), Time = c(83325.87, 
83362.65, 83376.68, 83392.27, 83398.77), tdelta = c(0, 36.782, 
14.027, 15.585, 6.507), SubjectID = c(1531L, 1531L, 1531L, 1531L, 
1531L), CoupleID = c(153L, 153L, 153L, 153L, 153L), PrePost = c("Post", 
"Post", "Post", "Post", "Post")), .Names = c("Button", "Intensity", 
"Acc", "Intensity", "RT", "Time", "tdelta", "SubjectID", "CoupleID", 
"PrePost"), row.names = c(NA, -5L), class = "data.frame")