我正在使用基数R中的reshape函数将重复度量设计的长格式数据帧转换为宽格式。请参阅下面的玩具数据集。问题1,2和3是对三项调查的个人答复。有四位参与者,每人参加四次调查。
Q1 <- c(2,6,5,4,3,8,9,2,1,5,4,7,3,7,2,1)
Q2 <- c(4,7,6,3,1,2,5,6,7,5,4,3,5,6,6,3)
Q3 <- c(7,9,3,1,5,3,7,5,3,3,5,7,8,9,9,3)
Participant <- rep(c("Bob","Sue","Jim","Tom"), times = 1, each = 4)
Time <- rep(c("FirstSurvey","SecondSurvey","ThirdSurvey","FourthSurvey"), times = 4)
m <- as.data.frame(cbind(Participant, Time, Q1, Q2, Q3))
这会产生以下数据帧
m
Participant Time Q1 Q2 Q3
1 Bob FirstSurvey 2 4 7
2 Bob SecondSurvey 6 7 9
3 Bob ThirdSurvey 5 6 3
4 Bob FourthSurvey 4 3 1
5 Sue FirstSurvey 3 1 5
6 Sue SecondSurvey 8 2 3
7 Sue ThirdSurvey 9 5 7
8 Sue FourthSurvey 2 6 5
9 Jim FirstSurvey 1 7 3
10 Jim SecondSurvey 5 5 3
11 Jim ThirdSurvey 4 4 5
12 Jim FourthSurvey 7 3 7
13 Tom FirstSurvey 3 5 8
14 Tom SecondSurvey 7 6 9
15 Tom ThirdSurvey 2 6 9
16 Tom FourthSurvey 1 3 3
如果你那么重塑它:
mReshaped <- reshape(m, idvar = "Participant", timevar = "Time", direction = "wide", sep = "", new.row.names = c(1,2,3,4))
它产生以下宽格式数据帧:
mReshaped
Participant Q1FirstSurvey Q2FirstSurvey Q3FirstSurvey Q1SecondSurvey Q2SecondSurvey
1 Bob 2 4 7 6 7
2 Sue 3 1 5 8 2
3 Jim 1 7 3 5 5
4 Tom 3 5 8 7 6
Q3SecondSurvey Q1ThirdSurvey Q2ThirdSurvey Q3ThirdSurvey Q1FourthSurvey Q2FourthSurvey
1 9 5 6 3 4 3
2 3 9 5 7 2 6
3 3 4 4 5 7 3
4 9 2 6 9 1 3
Q3FourthSurvey
1 1
2 5
3 7
4 3
使用以下列名称
colnames(mReshaped)
[1] "Participant" "Q1FirstSurvey" "Q2FirstSurvey" "Q3FirstSurvey" "Q1SecondSurvey"
[6] "Q2SecondSurvey" "Q3SecondSurvey" "Q1ThirdSurvey" "Q2ThirdSurvey" "Q3ThirdSurvey"
[11] "Q1FourthSurvey" "Q2FourthSurvey" "Q3FourthSurvey"
正如您所看到的那样,当数据帧被重新整形时,重塑函数会将时间变量作为后缀添加到每个重复测量的列名称中。
有没有人知道重塑函数中是否有参数允许您选择将Time变量作为前缀放在每个Value变量名称的前面中?
答案 0 :(得分:1)
我不确定您是否可以在reshape
内更改订单,但之后可以使用带有正则表达式的gsub
更改订单:
names(mReshaped) = gsub("(Q[0-9])(.*)", "\\2\\1", names(mReshaped))
[1] "Participant" "FirstSurveyQ1" "FirstSurveyQ2" "FirstSurveyQ3" "SecondSurveyQ1"
[6] "SecondSurveyQ2" "SecondSurveyQ3" "ThirdSurveyQ1" "ThirdSurveyQ2" "ThirdSurveyQ3"
[11] "FourthSurveyQ1" "FourthSurveyQ2" "FourthSurveyQ3"
UPDATE:代码如何工作的说明:代码使用正则表达式(或#34;正则表达式&#34;简称),这是一种文本处理语言。你第一次看到它时非常神秘。
在这种情况下,Q[0-9]
表示匹配&#34; Q&#34;随后是任何数字。 (Q[0-9])
将该匹配转变为&#34;捕获组&#34;意思是我们稍后可以参考它。这是捕获组#1。
.*
表示匹配所有剩余的字符(在Q[0-9]
匹配的任何字符后出现的任何字符)。 .
表示匹配任何单个字符;添加*
表示匹配任意长度的任何字符串。 (.*)
将匹配变为捕获组#2。
\\2\\1
获取我们捕获的两个字符串并反转它们的顺序。