目前我有一个文件,我需要从宽格式转换为长格式。数据的例子是:
Subject,Cat1_Weight,Cat2_Weight,Cat3_Weight,Cat1_Sick,Cat2_Sick,Cat3_Sick
1,10,11,12,1,0,0
2,7,8,9,1,0,0
但是,我需要长格式如下
Subject,CatNumber,Weight,Sickness
1,1,10,1
1,2,11,0
1,3,12,0
2,1,7,1
2,2,8,0
2,3,9,0
到目前为止,我已尝试在R中使用融合功能
datalong <- melt(exp2_simon_shortform, id ="Subject")
但它将每个列名称视为一个唯一的变量,每个变量都有自己的值。有没有人知道如何从指定的宽到长,引用列标题名称?
干杯。
编辑:我意识到我犯了错误。我的最终输出需要如下。因此,从Cat1_部分,我实际上需要离开&#34; Cat&#34;和&#34; 1&#34;Subject Animal CatNumber Weight Sickness
1 Cat 1 10 1
1 Cat 2 11 0
1 Cat 3 12 0
2 Cat 1 7 1
2 Cat 2 8 0
2 Cat 3 9 0
非常感谢任何更新的解决方案。
答案 0 :(得分:4)
“dplyr”+“tidyr”方法可能类似于:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, -Subject) %>%
separate(var, into = c("CatNumber", "variable")) %>%
spread(variable, val)
# Subject CatNumber Sick Weight
# 1 1 Cat1 1 10
# 2 1 Cat2 0 11
# 3 1 Cat3 0 12
# 4 2 Cat1 1 7
# 5 2 Cat2 0 8
# 6 2 Cat3 0 9
在其中添加mutate
以及gsub
以删除“CatNumber”列的“Cat”部分。
根据the discussions in chat,您的数据实际上看起来更像是:
A = c("ATCint", "Blank", "None"); B = 1:5; C = c("ResumptionTime", "ResumptionMisses")
colNames <- expand.grid(A, B, C)
colNames <- sprintf("%s%d_%s", colNames[[1]], colNames[[2]], colNames[[3]])
subject = 1:60
set.seed(1)
M <- matrix(sample(10, length(subject) * length(colNames), TRUE),
nrow = length(subject), dimnames = list(NULL, colNames))
mydf <- data.frame(Subject = subject, M)
因此,您需要执行一些额外的步骤来获得所需的输出。尝试:
library(dplyr)
library(tidyr)
mydf %>%
group_by(Subject) %>% ## Your ID variable
gather(var, val, -Subject) %>% ## Make long data. Everything except your IDs
separate(var, into = c("partA", "partB")) %>% ## Split new column into two parts
mutate(partA = gsub("(.*)([0-9]+)", "\\1_\\2", partA)) %>% ## Make new col easy to split
separate(partA, into = c("A1", "A2")) %>% ## Split this new column
spread(partB, val) ## Transform to wide form
哪个收益率:
Source: local data frame [900 x 5]
Subject A1 A2 ResumptionMisses ResumptionTime
(int) (chr) (chr) (int) (int)
1 1 ATCint 1 9 3
2 1 ATCint 2 4 3
3 1 ATCint 3 2 2
4 1 ATCint 4 7 4
5 1 ATCint 5 7 1
6 1 Blank 1 4 10
7 1 Blank 2 2 4
8 1 Blank 3 7 5
9 1 Blank 4 1 9
10 1 Blank 5 10 10
.. ... ... ... ... ...
答案 1 :(得分:3)
我们可以使用melt
中的library(data.table)
patterns
measure
变量{/ 1}}。
library(data.table)#v1.9.6+
DT <- melt(setDT(df1), measure=patterns('Weight$', 'Sick$'),
variable.name='CatNumber', value.name=c('Weight', 'Sick'))[order(Subject)]
DT
# Subject CatNumber Weight Sick
#1: 1 1 10 1
#2: 1 2 11 0
#3: 1 3 12 0
#4: 2 1 7 1
#5: 2 2 8 0
#6: 2 3 9 0
如果我们需要“动物”列,我们可以grep
代表“Cat”列,并使用sub
删除后缀子字符串,指定(:=
)它以创建“动物” '专栏。
DT[, Animal := sub('\\d+\\_.*', '', grep('Cat', colnames(df1), value=TRUE))]
DT
# Subject CatNumber Weight Sick Animal
#1: 1 1 10 1 Cat
#2: 1 2 11 0 Cat
#3: 1 3 12 0 Cat
#4: 2 1 7 1 Cat
#5: 2 2 8 0 Cat
#6: 2 3 9 0 Cat
答案 2 :(得分:3)
您可以使用基座reshape
执行此操作,例如:
reshape(dat, idvar="Subject", direction="long", varying=list(2:4,5:7),
v.names=c("Weight","Sick"), timevar="CatNumber")
# Subject CatNumber Weight Sick
#1.1 1 1 10 1
#2.1 2 1 7 1
#1.2 1 2 11 0
#2.2 2 2 8 0
#1.3 1 3 12 0
#2.3 2 3 9 0
或者,由于reshape
需要variablename_groupname
之类的名称,您可以更改名称,然后重新塑造以进行艰苦的工作:
names(dat) <- gsub("Cat(.+)_(.+)", "\\2_\\1", names(dat))
reshape(dat, idvar="Subject", direction="long", varying=-1,
sep="_", timevar="CatNumber")
# Subject CatNumber Weight Sick
#1.1 1 1 10 1
#2.1 2 1 7 1
#1.2 1 2 11 0
#2.2 2 2 8 0
#1.3 1 3 12 0
#2.3 2 3 9 0