我已经完成了一些操作,以达到以下数据框:
df
cluster.kmeans variable max mean median min sd
1 1 MonthlySMS 191 90.32258 71.0 8 56.83801
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
4 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
5 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
6 2 MonthlySMS 155 53.18815 57.0 1 31.64533
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
9 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
10 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
11 3 MonthlySMS 215 135.60202 137.0 49 34.09794
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
14 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
15 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
16 4 MonthlySMS 136 49.93980 54.5 1 31.47778
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
19 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
20 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
我想根据这些字符串订购variable
列:
top.vars_kmeans
[1] "ThreeMonthlySMS" "SixMonthlyData" "ThreeMonthlyData"
[4] "MonthlySMS" "SixMonthlySMS"
我可以使用sqldf执行此操作,如下所示:
library(sqldf)
a <- c(1,2,3,4,5)
a <- data.frame(top.vars_kmeans,a)
a <- sqldf('select a1.* ,b1.a from "MS.DATA.STATS.KMEANS" a1 inner join a b1
on a1.variable=b1."top.vars_kmeans"')
a <- sqldf('select * from a order by "cluster.kmeans",a')
a$a <- NULL
a
cluster.kmeans variable max mean median min sd
1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
我很想知道是否可以使用dplyr实现这一点......我对这个精彩包装的理解会得到加强....
需要帮助!
答案 0 :(得分:1)
我们可以将permit
与arrange
match
答案 1 :(得分:1)
您可以按照所需顺序重新定义d_step { }
(或factor
因子)(例如存储在ordered
中):
top.vars_kmeans
另请参阅在线help页面,或通过a$variable <- factor(a$variable, levels = top.vars_kmeans)
。
如果您想订购整个?factor
,请转到akrun的答案。
答案 2 :(得分:1)
您可以尝试group_by
和slice
:
df %>% group_by(cluster.kmeans) %>% slice(match(top.vars_kmeans, variable))
# cluster.kmeans variable max mean median min sd
# (int) (fctr) (int) (dbl) (dbl) (int) (dbl)
#1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
#2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
#3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
#4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
#5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
#6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
#7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
#8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
#9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
#10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
#11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
#12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
#13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
#14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
#15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
#16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
#17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
#18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
#19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
#20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157