合并两个数据集之后,我有一个包含300个变量的数据(一些变量以.x结尾,一些以.y结尾,一些以.x和.y结尾)。如何将所有不以.x和.y结尾的变量带到数据集的前100列。另外,我希望将col 101开始安排为(day.x,day.y,city.x,city.y,number.x,number.y等)。也就是说,具有相同名称的变量,例如城市,但具有不同的扩展名,彼此相邻/相邻。 例如:
city.y<- c(1,2,3,5,5,7,7,NA,NA,3,4,5)
B<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
number.x<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
day.x<-c(1,3,4,5,6,7,8,1,NA,3,5,3)
Z<-c(1,2,3,4,5,6,7,NA,NA,5,5,6)
day.y<-c(4,5,6,7,8,7,8,1,2,3,5,NA)
number.y<-c(3,4,5,6,1,2,7,6,7,NA,NA,6)
school.x<-c("a","b","b","c","n","f","h","NA","F","G","z","h")
S<-c(5,2,3,4,5,6,5,NA,NA,5,6,6)
school.y<-c("a","b","b","c","m","g","h","NA","NA","G","H","T")
city.x<- c(1,2,3,7,5,8,7,5,6,7,5,1)
df<- data.frame(city.y,B,number.x,day.x,Z,day.y,number.y,school.x,S,school.y,city.x)
我想以这种格式重新排序变量:B,S,Z,city.x,city.y,number.x,number.y,day.x,day.y和...
答案 0 :(得分:3)
添加一列以创建更一般的用例:
df$ZZZZZ = 1:6
然后,加载dplyr
包(用于链接运算符%>%
和select
函数):
library(dplyr)
排序将按正确的相对顺序获取列的每个子分组:
names(df) = sort(names(df))
现在使用正则表达式-matches("\\.[xy]$")
来捕获所有列而不使用&#34; .x&#34;或&#34; .y&#34;最后,将这些列放在开头。然后把所有其他列放在它们之后。
df = df %>% select(-matches("\\.[xy]$"), everything())
df
A B C ZZZZZ city.x city.y day.x day.y number.x number.y school.x school.y
1 1 3 1 1 1 1 4 3 a 5 a 1
2 2 4 2 2 3 2 5 4 b 2 b 2
...
11 4 NA 5 5 5 5 5 NA z 6 H 5
12 5 6 6 6 3 6 NA 6 h 6 T 1
如果您愿意,还可以在merge
功能中设置自己的后缀(而不是默认的&#34; .x&#34;和#34; .y&#34;),如下所示:
merge(df1, df2, by="col", suffixes=c("_df1", "_df2"))
如果你这样做,你当然也需要调整重新排列列的正则表达式。
答案 1 :(得分:2)
这应该这样做
extCols <- grepl("\\.", colnames(df))
df[, c(colnames(df)[(!extCols)],
sort(colnames(df)[extCols]))]