在数据帧上重新排列行

时间:2015-07-27 00:13:25

标签: r statistics

说我有:

#Continuous variable 1
x1<-rnorm(15, 1, .5)
#Continuous Variable 2
x2<-rnorm(15,1,.5)
#Sample Names
s.names<-c("S3","S5","S8","S14","S11","S13","S15","S12","S10","S2","S1","S6","S7","S4","S9")

df.temp<-data.frame(s.names,x1,x2)

df.temp
   s.names        x1        x2
1       S3 0.7025583 1.6616103
2       S5 0.4401055 1.5715047
3       S8 1.3691886 0.7754010
4      S14 1.1365712 1.2697196
5      S11 2.1193612 0.5968068
6      S13 0.6834145 1.4669863
7      S15 0.7050808 1.3287179
8      S12 2.0293910 0.7502497
9      S10 0.6807918 1.0793561
10      S2 0.6809873 0.7454851
11      S1 0.3775086 0.3150030
12      S6 2.1235465 1.4864190
13      S7 1.1657259 1.3279573
14      S4 1.4629794 0.6146412
15      S9 0.6916639 0.4507309

现在让我们尝试订购。

df.temp[order(df.temp$s.names),]
   s.names        x1        x2
11      S1 0.3775086 0.3150030
9      S10 0.6807918 1.0793561
5      S11 2.1193612 0.5968068
8      S12 2.0293910 0.7502497
6      S13 0.6834145 1.4669863
4      S14 1.1365712 1.2697196
7      S15 0.7050808 1.3287179
10      S2 0.6809873 0.7454851
1       S3 0.7025583 1.6616103
14      S4 1.4629794 0.6146412
2       S5 0.4401055 1.5715047
12      S6 2.1235465 1.4864190
13      S7 1.1657259 1.3279573
3       S8 1.3691886 0.7754010
15      S9 0.6916639 0.4507309

但我的问题是我现在在操纵数据框时遇到了麻烦。特别是,当我尝试对s.names进行排序或排序时,它总是返回&gt; S1,S10,S11,S12 ......,S2,S20,S21,S3,S4,S5,S6,S7行的内容。等等(不是21个样本,但参见上面的例子。)原因当然是我试图按行重新排列数据框。 order()和sort()遇到了这个问题。

此外,我想知道,如果我想“引导”或随机更改周围的行以获得相互关联的统计原因示例,因为在S1中将具有相应的x1和x2值,它可能只是在不同的,也许随机顺序,例如S5,S11,S6等

我的最终目标是进行回归,例如ANOVA(),cov()和cor()

编辑:添加了更多代码

3 个答案:

答案 0 :(得分:1)

您的问题是由于您尝试按字符串列排序,就好像它是一个数字列一样。如果所有元素都以S开头,则可以将它们设为数字:

> x <- paste0("S", 1:20)
> x
 [1] "S1"  "S2"  "S3"  "S4"  "S5"  "S6"  "S7"  "S8"  "S9"  "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
[19] "S19" "S20"
> sort(x)
 [1] "S1"  "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18" "S19" "S2"  "S20" "S3"  "S4"  "S5"  "S6"  "S7" 
[19] "S8"  "S9" 
> x2 <- sort(x)
> x2 <- as.numeric(gsub("[^0-9]", "", x2))
> sort(x2)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

如果您不想删除主要S,则可以对提取的数字使用order(),如下所示:

> x[order(as.numeric(gsub("[^0-9]", "", x)))]

或者在这个例子中

> x[order(x2)]

两者都会导致

 [1] "S1"  "S2"  "S3"  "S4"  "S5"  "S6"  "S7"  "S8"  "S9"  "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
[19] "S19" "S20"

你的另外不是很清楚,但如果它是一个不同的问题,你应该问一个新问题。

答案 1 :(得分:0)

与@Molx相同的方法,这次使用不同的函数:

df.temp[order(as.numeric(substr(df.temp$s.names,2,3))),]

您的数据应该可以满足您的需求。问题是你正在尝试对字符串进行排序,它们将按字母顺序(而不是数字)顺序进行。

答案 2 :(得分:0)

gtools包裹:

library(gtools)
df.temp[mixedorder(df.temp$s.names), ]

另一个基本选择:

n <- df.temp$s.names[order(as.numeric((gsub("S", "", df.temp$s.names))))] 
df.temp[match(n, df.temp$s.names), ]

输出:

   s.names         x1          x2
11      S1  1.2285667  1.48669700
10      S2  0.9438498  0.01775496
1       S3  1.3671933  1.66880402
14      S4  0.7718479  1.53751408
2       S5  0.6023717  0.94600954
12      S6 -0.1341811  1.17744773
13      S7  1.1150349 -0.24347135
3       S8  0.3934848  0.90117148
15      S9  1.7059979  1.64684407
9      S10  0.7533375  1.05615732
5      S11  0.6980853  0.46164739
8      S12  0.3826094  1.26324581
6      S13  0.9616772  1.58527306
4      S14 -0.1876272  1.05792541
7      S15  1.4213483  0.96066296

sqldf包裹:

library(sqldf)
sqldf("SELECT *, 
      ltrim([s.names],'S') AS n
      FROM [df.temp] ORDER BY n*1")

输出:

   s.names         x1          x2  n
1       S1  1.2285667  1.48669700  1
2       S2  0.9438498  0.01775496  2
3       S3  1.3671933  1.66880402  3
4       S4  0.7718479  1.53751408  4
5       S5  0.6023717  0.94600954  5
6       S6 -0.1341811  1.17744773  6
7       S7  1.1150349 -0.24347135  7
8       S8  0.3934848  0.90117148  8
9       S9  1.7059979  1.64684407  9
10     S10  0.7533375  1.05615732 10
11     S11  0.6980853  0.46164739 11
12     S12  0.3826094  1.26324581 12
13     S13  0.9616772  1.58527306 13
14     S14 -0.1876272  1.05792541 14
15     S15  1.4213483  0.96066296 15