重新排序变量

时间:2015-12-08 00:00:32

标签: r

合并两个数据集之后,我有一个包含300个变量的数据(一些变量以.x结尾,一些以.y结尾,一些以.x和.y结尾)。如何将所有不以.x和.y结尾的变量带到数据集的前100列。另外,我希望将col 101开始安排为(day.x,day.y,city.x,city.y,number.x,number.y等)。也就是说,具有相同名称的变量,例如城市,但具有不同的扩展名,彼此相邻/相邻。 例如:

city.y<- c(1,2,3,5,5,7,7,NA,NA,3,4,5)
B<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
number.x<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
day.x<-c(1,3,4,5,6,7,8,1,NA,3,5,3)
Z<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
day.y<-c(4,5,6,7,8,7,8,1,2,3,5,NA)
number.y<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
school.x<-c("a","b","b","c","n","f","h","NA","F","G","z","h")
S<-c(5,2,3,4,5,6,5,NA,NA,5,6,6)
school.y<-c("a","b","b","c","m","g","h","NA","NA","G","H","T")
city.x<- c(1,2,3,7,5,8,7,5,6,7,5,1)
df<- data.frame(city.y,B,number.x,day.x,Z,day.y,number.y,school.x,S,school.y,city.x)

我想以这种格式重新排序变量:B,S,Z,city.x,city.y,number.x,number.y,day.x,day.y和...

2 个答案:

答案 0 :(得分:3)

添加一列以创建更一般的用例:

df$ZZZZZ = 1:6

然后,加载dplyr包(用于链接运算符%>%select函数):

library(dplyr)

排序将按正确的相对顺序获取列的每个子分组:

names(df) = sort(names(df))

现在使用正则表达式-matches("\\.[xy]$")来捕获所有列而不使用&#34; .x&#34;或&#34; .y&#34;最后,将这些列放在开头。然后把所有其他列放在它们之后。

df = df %>% select(-matches("\\.[xy]$"), everything())

df

    A  B  C ZZZZZ city.x city.y day.x day.y number.x number.y school.x school.y
1   1  3  1     1      1      1     4     3        a        5        a        1
2   2  4  2     2      3      2     5     4        b        2        b        2
...
11  4 NA  5     5      5      5     5    NA        z        6        H        5
12  5  6  6     6      3      6    NA     6        h        6        T        1

如果您愿意,还可以在merge功能中设置自己的后缀(而不是默认的&#34; .x&#34;和#34; .y&#34;),如下所示:

merge(df1, df2, by="col", suffixes=c("_df1", "_df2"))

如果你这样做,你当然也需要调整重新排列列的正则表达式。

答案 1 :(得分:2)

这应该这样做

extCols <- grepl("\\.", colnames(df))
df[, c(colnames(df)[(!extCols)], 
     sort(colnames(df)[extCols]))]