将长数据格式转换为宽格式

时间:2017-07-31 17:16:25

标签: r reshape tidyr reshape2

我需要使用以下条件(如果可能)将长数据格式(长)转换为宽格式(宽):

1)所有数据文件都是长格式(长),具有相同的结构(id,name,value),但每个数据文件将有不同的变量,值和变量数:

id = case
name = variable
value = variable value(s)

2)每个数据文件将是变量的不同组合(因子,整数,数字)。有些因素可能在每个案例中有多个级别(水果和肉长),我想为这些因素中的每个级别创建一个单独的虚拟变量(逻辑)。因子和数值变量的数量因数据文件而异。

3)鉴于每个数据文件的变量都不同,我希望自动化它,我可以在不更改任何变量名的情况下将相同的代码应用于每个数据文件。

我已经尝试过reshape2和tidyr,但无法找到完成任务的方法。

这是长格式:

    long
   id   name     value
1   1  fruit     apple
2   1  fruit    banana
3   1  fruit    orange
4   1  fruit pineapple
5   1   meat     steak
6   1   meat   chicken
7   1  fname      dave
8   1     wt       185
9   1 status    active
10  2  fruit     apple
11  2  fruit pineapple
12  2   meat   chicken
13  2  fname      jeff
14  2     wt       205
15  2 status    active
16  3  fruit     apple
17  3  fruit    banana
18  3   meat     steak
19  3  fname      jane
20  3     wt       125
21  3 status    lapsed

这是我更喜欢的宽幅格式:

wide
  id fruit.apple fruit.banana fruit.orange fruit.pineapple meat.steak meat.chicken fname  wt status
1  1        TRUE         TRUE         TRUE            TRUE       TRUE         TRUE  dave 185 active
2  2        TRUE        FALSE        FALSE            TRUE      FALSE         TRUE  jeff 205 active
3  3        TRUE         TRUE        FALSE           FALSE       TRUE        FALSE  jane 125 lapsed

长格式数据:

long <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), name = c("fruit", 
"fruit", "fruit", "fruit", "meat", "meat", "fname", "wt", "status", 
"fruit", "fruit", "meat", "fname", "wt", "status", "fruit", "fruit", 
"meat", "fname", "wt", "status"), value = c("apple", "banana", 
"orange", "pineapple", "steak", "chicken", "dave", "185", "active", 
"apple", "pineapple", "chicken", "jeff", "205", "active", "apple", 
"banana", "steak", "jane", "125", "lapsed")), .Names = c("id", 
"name", "value"), class = "data.frame", row.names = c(NA, -21L
))

1 个答案:

答案 0 :(得分:0)

解决方案使用dplyrtidyr

library(dplyr)
library(tidyr)

wide <- long %>%
  mutate(value2 = ifelse(name %in% c("fruit", "meat"), "1", value),
         name2 = ifelse(name %in% c("fruit", "meat"), 
                       paste(name, value, sep = "."), name)) %>%
  select(-name, -value) %>%
  spread(name2, value2, fill = "0") %>%
  mutate_at(vars(matches("fruit|meat")), as.numeric) %>%
  mutate_at(vars(matches("fruit|meat")), as.logical)