我的数据看起来像这样,
posture code HR EE a b
cycling A03 102 100 3 6
standingA03 99 99 4 6
sitting A03 98 67 5 5
walking A03 97 78 3 6
cycling B01 111 76 5 5
standingB01 100 88 4 4
sitting B01 78 34 4 3
walking B01 99 99 2 2
我需要转置它,使它看起来如下:
code cycling_HR cycling_EE cycling_a cycling_b standing_HR standing_EE standing_a standing_b sitting_HR sitting_EE sitting_a sitting_b walking_HR walking_EE walking_a walking_b
A03 102 100 3 6 99 99 4 6 98 67 5 5 97 78 3 6
B01 111 76 5 5 100 88 4 4 78 34 4 3 99 99 2 2
等等(抱歉格式化)。 我无法找到适当的答案来澄清问题。任何帮助都会受到欢迎。
答案 0 :(得分:5)
这是一个非常基本的“长到宽”的重塑问题。
您可以使用reshape
函数在基础R中执行此操作:
reshape(mydf, direction = "wide", idvar = "code", timevar = "posture")
# code HR.cycling EE.cycling a.cycling b.cycling HR.standing EE.standing
# 1 A03 102 100 3 6 99 99
# 5 B01 111 76 5 5 100 88
# a.standing b.standing HR.sitting EE.sitting a.sitting b.sitting HR.walking
# 1 4 6 98 67 5 5 97
# 5 4 4 78 34 4 3 99
# EE.walking a.walking b.walking
# 1 78 3 6
# 5 99 2 2
您还可以查看“dplyr”+“tidyr”方法,可能是这样的:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, HR:b) %>%
unite(v1, posture, var) %>%
spread(v1, val)
答案 1 :(得分:4)
或者对于大数据集(因为reshape
非常慢),您可以尝试data.table
v>=1.9.5
library(data.table)
dcast(setDT(df), code ~ posture, value.var = c("HR", "EE", "a", "b"))
# code cycling_HR sitting_HR standing_HR walking_HR cycling_EE sitting_EE standing_EE walking_EE cycling_a sitting_a standing_a walking_a cycling_b sitting_b
# 1: A03 102 98 99 97 100 67 99 78 3 5 4 3 6 5
# 2: B01 111 78 100 99 76 34 88 99 5 4 4 2 5 3
# standing_b walking_b
# 1: 6 6
# 2: 4 2
略微更大的数据(400万行)的基准:
library(dplyr)
library(tidyr)
require(data.table)
set.seed(1L)
df = data.frame(posture = c("cycling", "standing", "sitting", "walking"),
code = rep(paste("A", 1:1e6, sep=""), each=4L),
HR = sample(120, 4e6, TRUE),
EE = sample(100, 4e6, TRUE),
a = sample(5, 4e6, TRUE),
b = sample(10, 4e6, TRUE),
stringsAsFactors=FALSE)
# base R approach
system.time(reshape(df, direction = "wide", idvar = "code", timevar = "posture"))
# user system elapsed
# 23.183 0.470 23.838
# dplyr + tidyr
system.time({
df %>%
gather(var, val, HR:b) %>%
unite(v1, posture, var) %>%
spread(v1, val)
})
# user system elapsed
# 17.312 1.046 18.446
# data.table
system.time(dcast(setDT(df), code ~ posture,
value.var = c("HR", "EE", "a", "b")))
# user system elapsed
# 1.216 0.136 1.367
答案 2 :(得分:0)
使用tidyr?
library(tidyr)
x<-data.frame(posture=c("cycling", "standing", "sitting", "walking"),
code=c("A03", "A03", "B01", "B01"),
HR=c(1,3,3,4),
EE=c(1,3,3,5))
x2<-gather(x, key=type, value=vals, -c(code, posture))
x2$vars<-paste(x2$posture, x2$type, sep="_")
x2<-select(x2, -c(posture, type))
spread(x2, key=vars, value=vals)