我需要使用以下条件(如果可能)将长数据格式(长)转换为宽格式(宽):
1)所有数据文件都是长格式(长),具有相同的结构(id,name,value),但每个数据文件将有不同的变量,值和变量数:
id = case
name = variable
value = variable value(s)
2)每个数据文件将是变量的不同组合(因子,整数,数字)。有些因素可能在每个案例中有多个级别(水果和肉长),我想为这些因素中的每个级别创建一个单独的虚拟变量(逻辑)。因子和数值变量的数量因数据文件而异。
3)鉴于每个数据文件的变量都不同,我希望自动化它,我可以在不更改任何变量名的情况下将相同的代码应用于每个数据文件。
我已经尝试过reshape2和tidyr,但无法找到完成任务的方法。
这是长格式:
long
id name value
1 1 fruit apple
2 1 fruit banana
3 1 fruit orange
4 1 fruit pineapple
5 1 meat steak
6 1 meat chicken
7 1 fname dave
8 1 wt 185
9 1 status active
10 2 fruit apple
11 2 fruit pineapple
12 2 meat chicken
13 2 fname jeff
14 2 wt 205
15 2 status active
16 3 fruit apple
17 3 fruit banana
18 3 meat steak
19 3 fname jane
20 3 wt 125
21 3 status lapsed
这是我更喜欢的宽幅格式:
wide
id fruit.apple fruit.banana fruit.orange fruit.pineapple meat.steak meat.chicken fname wt status
1 1 TRUE TRUE TRUE TRUE TRUE TRUE dave 185 active
2 2 TRUE FALSE FALSE TRUE FALSE TRUE jeff 205 active
3 3 TRUE TRUE FALSE FALSE TRUE FALSE jane 125 lapsed
长格式数据:
long <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), name = c("fruit",
"fruit", "fruit", "fruit", "meat", "meat", "fname", "wt", "status",
"fruit", "fruit", "meat", "fname", "wt", "status", "fruit", "fruit",
"meat", "fname", "wt", "status"), value = c("apple", "banana",
"orange", "pineapple", "steak", "chicken", "dave", "185", "active",
"apple", "pineapple", "chicken", "jeff", "205", "active", "apple",
"banana", "steak", "jane", "125", "lapsed")), .Names = c("id",
"name", "value"), class = "data.frame", row.names = c(NA, -21L
))
答案 0 :(得分:0)
解决方案使用dplyr
和tidyr
。
library(dplyr)
library(tidyr)
wide <- long %>%
mutate(value2 = ifelse(name %in% c("fruit", "meat"), "1", value),
name2 = ifelse(name %in% c("fruit", "meat"),
paste(name, value, sep = "."), name)) %>%
select(-name, -value) %>%
spread(name2, value2, fill = "0") %>%
mutate_at(vars(matches("fruit|meat")), as.numeric) %>%
mutate_at(vars(matches("fruit|meat")), as.logical)