说我有:
#Continuous variable 1
x1<-rnorm(15, 1, .5)
#Continuous Variable 2
x2<-rnorm(15,1,.5)
#Sample Names
s.names<-c("S3","S5","S8","S14","S11","S13","S15","S12","S10","S2","S1","S6","S7","S4","S9")
df.temp<-data.frame(s.names,x1,x2)
df.temp
s.names x1 x2
1 S3 0.7025583 1.6616103
2 S5 0.4401055 1.5715047
3 S8 1.3691886 0.7754010
4 S14 1.1365712 1.2697196
5 S11 2.1193612 0.5968068
6 S13 0.6834145 1.4669863
7 S15 0.7050808 1.3287179
8 S12 2.0293910 0.7502497
9 S10 0.6807918 1.0793561
10 S2 0.6809873 0.7454851
11 S1 0.3775086 0.3150030
12 S6 2.1235465 1.4864190
13 S7 1.1657259 1.3279573
14 S4 1.4629794 0.6146412
15 S9 0.6916639 0.4507309
现在让我们尝试订购。
df.temp[order(df.temp$s.names),]
s.names x1 x2
11 S1 0.3775086 0.3150030
9 S10 0.6807918 1.0793561
5 S11 2.1193612 0.5968068
8 S12 2.0293910 0.7502497
6 S13 0.6834145 1.4669863
4 S14 1.1365712 1.2697196
7 S15 0.7050808 1.3287179
10 S2 0.6809873 0.7454851
1 S3 0.7025583 1.6616103
14 S4 1.4629794 0.6146412
2 S5 0.4401055 1.5715047
12 S6 2.1235465 1.4864190
13 S7 1.1657259 1.3279573
3 S8 1.3691886 0.7754010
15 S9 0.6916639 0.4507309
但我的问题是我现在在操纵数据框时遇到了麻烦。特别是,当我尝试对s.names进行排序或排序时,它总是返回&gt; S1,S10,S11,S12 ......,S2,S20,S21,S3,S4,S5,S6,S7行的内容。等等(不是21个样本,但参见上面的例子。)原因当然是我试图按行重新排列数据框。 order()和sort()遇到了这个问题。
此外,我想知道,如果我想“引导”或随机更改周围的行以获得相互关联的统计原因示例,因为在S1中将具有相应的x1和x2值,它可能只是在不同的,也许随机顺序,例如S5,S11,S6等
我的最终目标是进行回归,例如ANOVA(),cov()和cor()
编辑:添加了更多代码
答案 0 :(得分:1)
您的问题是由于您尝试按字符串列排序,就好像它是一个数字列一样。如果所有元素都以S
开头,则可以将它们设为数字:
> x <- paste0("S", 1:20)
> x
[1] "S1" "S2" "S3" "S4" "S5" "S6" "S7" "S8" "S9" "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
[19] "S19" "S20"
> sort(x)
[1] "S1" "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18" "S19" "S2" "S20" "S3" "S4" "S5" "S6" "S7"
[19] "S8" "S9"
> x2 <- sort(x)
> x2 <- as.numeric(gsub("[^0-9]", "", x2))
> sort(x2)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
如果您不想删除主要S
,则可以对提取的数字使用order()
,如下所示:
> x[order(as.numeric(gsub("[^0-9]", "", x)))]
或者在这个例子中
> x[order(x2)]
两者都会导致
[1] "S1" "S2" "S3" "S4" "S5" "S6" "S7" "S8" "S9" "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
[19] "S19" "S20"
你的另外不是很清楚,但如果它是一个不同的问题,你应该问一个新问题。
答案 1 :(得分:0)
与@Molx相同的方法,这次使用不同的函数:
df.temp[order(as.numeric(substr(df.temp$s.names,2,3))),]
您的数据应该可以满足您的需求。问题是你正在尝试对字符串进行排序,它们将按字母顺序(而不是数字)顺序进行。
答案 2 :(得分:0)
gtools
包裹:
library(gtools)
df.temp[mixedorder(df.temp$s.names), ]
另一个基本选择:
n <- df.temp$s.names[order(as.numeric((gsub("S", "", df.temp$s.names))))]
df.temp[match(n, df.temp$s.names), ]
输出:
s.names x1 x2
11 S1 1.2285667 1.48669700
10 S2 0.9438498 0.01775496
1 S3 1.3671933 1.66880402
14 S4 0.7718479 1.53751408
2 S5 0.6023717 0.94600954
12 S6 -0.1341811 1.17744773
13 S7 1.1150349 -0.24347135
3 S8 0.3934848 0.90117148
15 S9 1.7059979 1.64684407
9 S10 0.7533375 1.05615732
5 S11 0.6980853 0.46164739
8 S12 0.3826094 1.26324581
6 S13 0.9616772 1.58527306
4 S14 -0.1876272 1.05792541
7 S15 1.4213483 0.96066296
sqldf
包裹:
library(sqldf)
sqldf("SELECT *,
ltrim([s.names],'S') AS n
FROM [df.temp] ORDER BY n*1")
输出:
s.names x1 x2 n
1 S1 1.2285667 1.48669700 1
2 S2 0.9438498 0.01775496 2
3 S3 1.3671933 1.66880402 3
4 S4 0.7718479 1.53751408 4
5 S5 0.6023717 0.94600954 5
6 S6 -0.1341811 1.17744773 6
7 S7 1.1150349 -0.24347135 7
8 S8 0.3934848 0.90117148 8
9 S9 1.7059979 1.64684407 9
10 S10 0.7533375 1.05615732 10
11 S11 0.6980853 0.46164739 11
12 S12 0.3826094 1.26324581 12
13 S13 0.9616772 1.58527306 13
14 S14 -0.1876272 1.05792541 14
15 S15 1.4213483 0.96066296 15