Question

任何人都可以帮助安排长数据到广泛的数据，但是链接的结果很复杂，即以研究编号标识的宽格式列出这个重复的结果在SN之后以宽格式列出（我已经显示了一个缩写表，结果更多每个患者在底部列出重复的LabTest，LabDate，Result，Lower，Upper）...我尝试过熔化和重铸，并且绑定列但似乎无法使其工作。超过1000个结果重新格式化所以不能手动输入结果需要重新格式化R格式的长数据excel文档谢谢

Original data looks like this

SN     LabTest     LabDate    Result Lower Upper
TD62   Creat       05/12/2004  22     30    90
TD62   AST         06/12/2004  652    6     45
TD58   Creat       26/05/2007  72     30    90
TD58   Albumin     26/05/2005  22     25    35  
TD14   AST         28/02/2007  234    6     45
TD14   Albumin     26/02/2007  15     25    35

格式化数据应如下所示

SN LabTCode LabDate Result Lower Upper LabCode LabDate Result Lower Upper
TD62 Creat   05/12/04  22    30   90   AST     06/12/04  652   6    45
TD58 Creat   26/05/05  72    30   90   Alb     26/05/05  22    25   35
TD14 AST     28/02/07  92    30   90   Alb     26/02/07  15    25   35

Formatted data looks like this

到目前为止，我已经尝试过：

data_wide2 <- dcast(tdl, SN + LabDate ~ LabCode, value.var="Result")

和

melt(tdl, id = c("SN", "LabDate"), measured= c("Result", "Upper", + "Lower"))

Answer 1

您的问题是R不会喜欢决赛桌，因为它有重复的列名。也许你需要那种格式的数据，但这是一种存储数据的坏方法，因为如果没有大量的手工工作就很难将列重新放回到行中。

也就是说，如果你想这样做，你需要一个新列来帮助你转置数据。

我在下面使用了dplyr和tidyr，值得关注而不是重塑。他们是同一作者，但更现代，设计为'tidyverse'的一部分。

library(dplyr)
library(tidyr)

#Recreate your data (not doing this bit in your question is what got you downvoted)
df <- data.frame(
  SN = c("TD62","TD62","TD58","TD58","TD14","TD14"),
  LabTest = c("Creat","AST","Creat","Albumin","AST","Albumin"),
  LabDate = c("05/12/2004","06/12/2004","26/05/2007","26/05/2005","28/02/2007","26/02/2007"),
  Result = c(22,652,72,22,234,15),
  Lower = c(30,6,30,25,6,25),
  Upper = c(90,45,90,35,45,35),
  stringsAsFactors = FALSE
)

output <- df %>% 
  group_by(SN) %>% 
  mutate(id_number = row_number()) %>% #create an id number to help with tracking the data as it's transposed
  gather("key", "value", -SN, -id_number) %>% #flatten the data so that we can rename all the column headers
  mutate(key = paste0("t",id_number, key)) %>% #add id_number to the column names. 't' for 'test' to start name with a letter.
  select(-id_number) %>% #don't need id_number anymore
  spread(key, value)

  SN    t1LabDate  t1LabTest t1Lower t1Result t1Upper t2LabDate  t2LabTest t2Lower t2Result t2Upper
  <chr> <chr>      <chr>     <chr>   <chr>    <chr>   <chr>      <chr>     <chr>   <chr>    <chr>  
1 TD14  28/02/2007 AST       6       234      45      26/02/2007 Albumin   25      15       35     
2 TD58  26/05/2007 Creat     30      72       90      26/05/2005 Albumin   25      22       35     
3 TD62  05/12/2004 Creat     30      22       90      06/12/2004 AST       6       652      45

你就在那里，如果你需要特定顺序的列，可能会有一些排序问题仍然存在。

R将长数据格式化为宽数据......但链接结果

1 个答案: