我有一个宽格式的csv文件,我需要将其更改为长格式。我刚刚给出了前3行。
CODEA C45 ragek ra80 ra98 ... Obese14 Overweight14 Obese21 hibp14 hibp21 Overweight21
1 1 NA 3 4 1 NA NA NA NA NA NA NA NA
2 3 2 3 3 1 0 0 0 0 1 0 0 0
3 4 2 3 6 1 NA NA NA NA NA NA NA NA
这种情况继续下去。 Obese 14 (Yes/No); Overweight(yes/no)
等
> names(Copy.of.BP_2)
[1] "CODEA" "C45" "ragek" "ra80"
[5] "ra98" "CBCLAggressionAt1410" "CBCLInternalisingAt1410" "Obese14"
[9] "Overweight14" "Overweight21" "Obese21" "hibp14"
[13] "hibp21"
它有6898个观测值和13个变量
我正在尝试以堆叠格式组织这些数据;我认为以下是一个不错的选择。我不确定如何合并obese
和overweight
类别,原始长版本的obese14
,overweight14
,obese 21
和overweight21
为4不同的类别。
CODEA ... time Obese Overweight HiBP
14
21
14
21 ... etc
我的语法为:
BP.stack1=reshape(Copy.of.BP_2,
timevar="time",direction="long",
varying=list(names(Copy.of.BP_2[8:13]),
v.names="Obese","Overweight","HiBP",idvar=c("CODEA")
它似乎不起作用,它给出+
符号并等待进一步的命令。
我应该使用melt
和cast
吗?我阅读了reshape
包装手册,但无法理解。
修改:问题重组
答案 0 :(得分:3)
坚持使用基础R reshape()
,请尝试以下操作。
我认为我已经使用以下内容重新创建了您的示例数据:
Copy.of.BP_2 <-
structure(list(CODEA = c(1, 3, 4), C45 = c(NA, 2, 2), ragek = c(3,
3, 3), ra80 = c(4, 3, 6), ra98 = c(1, 1, 1), CBCLAggressionAt1410 = c(NA,
0, NA), CBCLInternalisingAt1410 = c(NA, 0, NA), Obese14 = c(NA,
0, NA), Overweight14 = c(NA, 0, NA), Overweight21 = c(NA, 1,
NA), Obese21 = c(NA, 0, NA), hibp14 = c(NA, 0, NA), hibp21 = c(NA,
0, NA)), .Names = c("CODEA", "C45", "ragek", "ra80", "ra98",
"CBCLAggressionAt1410", "CBCLInternalisingAt1410", "Obese14",
"Overweight14", "Overweight21", "Obese21", "hibp14", "hibp21"
), row.names = c(NA, -3L), class = "data.frame")
Copy.of.BP_2
# CODEA C45 ragek ra80 ra98 CBCLAggressionAt1410 CBCLInternalisingAt1410
# 1 1 NA 3 4 1 NA NA
# 2 3 2 3 3 1 0 0
# 3 4 2 3 6 1 NA NA
# Obese14 Overweight14 Overweight21 Obese21 hibp14 hibp21
# 1 NA NA NA NA NA NA
# 2 0 0 1 0 0 0
# 3 NA NA NA NA NA NA
首先,为方便起见,让我们创建一个度量变量的向量 - 我们想要从宽到长格式“堆叠”的变量。
measurevars <- names(Copy.of.BP_2)[grepl("Obese|Overweight|hibp",
names(Copy.of.BP_2))]
接下来,使用reshape()
,指定方向,标识变量以及哪些变量随时间“变化”(measurevars
,从上面开始)。
BP_2_long <- reshape(Copy.of.BP_2, direction = "long", idvar="CODEA",
varying = measurevars, sep = "")
BP_2_long
# CODEA C45 ragek ra80 ra98 CBCLAggressionAt1410 CBCLInternalisingAt1410
# 1.14 1 NA 3 4 1 NA NA
# 3.14 3 2 3 3 1 0 0
# 4.14 4 2 3 6 1 NA NA
# 1.21 1 NA 3 4 1 NA NA
# 3.21 3 2 3 3 1 0 0
# 4.21 4 2 3 6 1 NA NA
# time Obese Overweight hibp
# 1.14 14 NA NA NA
# 3.14 14 0 0 0
# 4.14 14 NA NA NA
# 1.21 21 NA NA NA
# 3.21 21 0 1 0
# 4.21 21 NA NA NA
如果您只对id列和measure列感兴趣,还可以在drop
命令中添加reshape()
参数:
BP_2_long_2 <- reshape(
Copy.of.BP_2, direction = "long", idvar="CODEA",
varying = measurevars, sep = "",
drop = !names(Copy.of.BP_2) %in% c(measurevars, "CODEA"))
BP_2_long_2
# CODEA time Obese Overweight hibp
# 1.14 1 14 NA NA NA
# 3.14 3 14 0 0 0
# 4.14 4 14 NA NA NA
# 1.21 1 21 NA NA NA
# 3.21 3 21 0 1 0
# 4.21 4 21 NA NA NA
以下是您试图通过评论如何尝试解决问题的逐个参数细分。
BP.stack1 =
reshape(Copy.of.BP_2, # Fine
timevar="time", # Fine
direction="long", # Fine
varying=list(names(Copy.of.BP_2)[8:13]), # Wrong. Use "varying = 8:13" instead
v.names="Obese","Overweight","HiBP", # Wrong. This needs to be in c()
idvar=c("CODEA") # Almost... missing your closing ")"
因此,要获得完整的工作命令:
BP.stack1 = reshape(
Copy.of.BP_2,
timevar="time",
direction="long",
varying=8:13,
v.names=c("Obese", "Overweight", "HiBP"),
idvar=c("CODEA"))
我通常会尝试不要过多依赖列号,因为这些列更可能重新排列,而不是要重命名列。因此我使用grepl()
根据特定模式匹配名称。