问题
我已经看过很多SO问题/ R Studio博客来解决这个问题但到目前为止没有任何帮助。我尝试在reorder()函数中使用各种函数,创建和使用宽数据集。
https://rstudio-pubs-static.s3.amazonaws.com/7433_4537ea5073dc4162950abb715f513469.html
reorder x-axis variables by sorting a subset of the data
How do I sort a dataframe by the average of subsets of one of the rows?
输出目标
每个X点有3个Y值 - 一个是基准。为了展示其他两个点的表现,我需要以降序的方式订购基准,以创建一个如此的图形(红色是基准):
模拟上面的图表以显示目标。请忽略少数红色异常点。
当前方法
sample.chart <-
ggplot(sample.data, aes(
x = reorder(store, -scaling),
y = scaling,
color=version
)) +
geom_point(alpha = 0.7) +
theme(
axis.text.x = element_blank()
)
我如何能够“定位”特定子集以按图表排序?
数据str
> str(sample.data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of 3 variables:
$ store : int 1 2 3 4 5 6 7 8 9 10 ...
$ scaling: num 3.67 17.5 51 7.6 49 ...
$ version: chr "test.1" "test.1" "test.1" "test.1" ...
数据
> dput(sample.data)
structure(list(store = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), scaling = c(3.66666666666667,
17.5, 51, 7.6, 49, 0.333333333333333, 7.25, 13, 1.66666666666667,
9.73333333333333, 0.307692307692308, 0.74468085106383, 5, 1.27272727272727,
0.259259259259259, 0.866666666666667, 2.625, 1.58333333333333,
2.71428571428571, 0.625, 5.5, 35, 51, 9.5, 49, 3, 4.83333333333333,
8.66666666666667, 3.33333333333333, 4.17142857142857, 0.666666666666667,
2.91666666666667, 1.42857142857143, 2.8, 0.424242424242424, 0.8125,
1.82608695652174, 1.72727272727273, 2.375, 0.571428571428571,
66, 62.78461538, 56.1, 53.9, 49.5, 47.3, 39.1, 39.05, 37.2, 30.8,
29.7, 29.15, 28.6, 23.61333333, 20.8, 19.25, 18.61538462, 17.74666667,
17.11111111, 16.8), version = c("test.1", "test.1", "test.1",
"test.1", "test.1", "test.1", "test.1", "test.1", "test.1", "test.1",
"test.1", "test.1", "test.1", "test.1", "test.1", "test.1", "test.1",
"test.1", "test.1", "test.1", "test.2", "test.2", "test.2", "test.2",
"test.2", "test.2", "test.2", "test.2", "test.2", "test.2", "test.2",
"test.2", "test.2", "test.2", "test.2", "test.2", "test.2", "test.2",
"test.2", "test.2", "benchmark", "benchmark", "benchmark", "benchmark",
"benchmark", "benchmark", "benchmark", "benchmark", "benchmark",
"benchmark", "benchmark", "benchmark", "benchmark", "benchmark",
"benchmark", "benchmark", "benchmark", "benchmark", "benchmark",
"benchmark")), .Names = c("store", "scaling", "version"), row.names = c(NA,
-60L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:1)
这个怎么样
ggplot(sample.data, aes(
x = reorder(store, -scaling*(version=="benchmark"), max),
y = scaling,
color=version
)) +
geom_point(alpha = 0.7) +
theme(
axis.text.x = element_blank()
)
这里我们将非基准分数乘以0(FALSE~ = 0)并取最大值,仅根据基准分数对每组进行重新排序。
答案 1 :(得分:0)
Regardless of how you order the x-axis values, since you keep them as numeric values, ggplot will order them numerically. Instead, you need to turn them into factors, ordered by descending order of the y-value. Factors are technically stored with ordinal values, so ggplot will order the points by those new integer indices instead:
library(dplyr)
library(forcats)
sample.data <- sample.data %>%
arrange(version, desc(scaling)) %>% #benchmark comes first alphabetically, so it will use all of those values first for sorting
mutate(store = as_factor(as.character(store)))
ggplot(sample.data, aes(
x = store,
y = scaling,
color=version
)) +
geom_point(alpha = 0.7) +
theme(
axis.text.x = element_blank()
)