基于z分数计算的Boxplot填充

时间:2018-03-17 10:44:23

标签: r

我有一些运动员身体活动数据,我正在根据历史数据绘制一些当前数据。 I' m使用的两个数据集如下。

历史数据 - quartertwo2017



``` r
Player.Name Date    Distance  HIR      V6
Player 1    10/9/17 7060.621  2506.20  12.50
Player 1    15/7/17 4978.625  1596.19  44.26
Player 1    2/7/17  6787.667  2048.61  39.67
Player 1    22/7/17 6881.126  2065.80  31.48
Player 1    24/6/17 5802.060  2204.87  65.48
Player 1    29/7/17 7035.075  2085.32  22.56
Player 1    3/9/17  7016.175  2659.18  66.14
Player 1    5/8/17  6137.929  2154.36  25.49
Player 1    9/6/17  5515.685  2054.66  189.55
Player 1    9/7/17  6311.515  2144.63  20.54
Player 2    1/4/17  7150.221  2307.78  233.88
Player 2    10/9/17 8115.131  3136.33  217.86
Player 2    13/5/17 6391.008  2325.89  101.85
Player 2    15/7/17 6919.630  2136.40  118.64
Player 2    17/6/17 6366.357  2177.28  189.09
Player 2    19/8/17 7230.393  2530.59  104.58
Player 2    2/7/17  6620.122  1908.88  36.34
Player 2    20/5/17 7335.201  2250.34  152.84
Player 2    22/4/17 6956.030  2483.05  376.06
Player 2    22/7/17 7643.874  2370.89  172.20
Player 2    24/3/17 4258.366  1447.50  195.18
Player 2    24/6/17 7305.026  2771.67  297.99
Player 2    26/8/17 8024.780  2867.62  318.08
Player 2    27/5/17 6714.186  2409.16  125.31
Player 2    28/4/17 7106.519  2832.97  337.05
Player 2    29/7/17 8693.820  1961.28  27.80
Player 2    3/9/17  8005.006  2741.90  139.24
Player 2    5/8/17  7676.653  2475.58  111.07
Player 2    9/6/17  7176.619  2645.06  137.82
Player 2    9/7/17  7946.231  3140.44  126.59
#> Error: <text>:1:16: unexpected symbol
#> 1: Player.Name    Date
#>                    ^
```
&#13;
&#13;
&#13;

当前数据 - quartertwo2018

&#13;
&#13;
``` r
Player.Name   Date   Distance     HIR      V6
Player 1      2/3/18 5234.390     1513.73  41.82
Player 2      2/3/18 6352.987     2054.94  166.72
#> Error: <text>:1:15: unexpected symbol
#> 1: Player.Name   Date
#>                   ^
```
&#13;
&#13;
&#13;

具体来说,我使用geom_point绘制运动员所覆盖的当前总距离与他们通常使用geom_boxplot所覆盖的距离。我到目前为止的代码如下:

plot_TD_Q2 <- ggplot(data = quartertwo2017, aes(x = Player.Name, y = Distance)) +
  geom_boxplot(fill = "light blue") +
  coord_flip() +
  ggtitle("Quarter 2") +
  xlab("Player") +
  ylab("Total Distance") +
  theme_classic()

plot_TD_Q2 <- plot_TD_Q2 + geom_point(data = quartertwo2018, aes(x = Player.Name, y = Distance),
  position = position_jitter(width = 0.5),
  col = "red",
  cex = 3)

此代码带来的输出让我非常满意。但是,我想知道是否可以根据z分数计算改变箱线图的颜色。

例如,如果运动员目前的情况,我希望盒子图的颜色变红。总距离(geom_point)与其平均历史数据相差(>)3 SD。此外,如果运动员当前的总距离在1到2.99 SD之间,则箱线图将变为琥珀色,如果它在1 SD内,则将填充为绿色。

我的历史数据是从数据集quartertwo2017中提取的,而我的当前数据是从数据集quartertwo2018中提取的。数据为quartertwo2018。因此,x =从quartertwo2017得到的当前总距离与 import javax.ws.rs.client.Client; import javax.ws.rs.client.ClientBuilder; import javax.ws.rs.client.Entity; import javax.ws.rs.core.Response; import javax.ws.rs.core.MediaType; Client client = ClientBuilder.newClient(); Entity payload = Entity.json("{ 'image': 'http://media.kairos.com/kairos- elizabeth.jpg', 'subject_id': 'Elizabeth', 'gallery_name': 'MyGallery'}"); Response response = client.target("https://api.kairos.com/enroll") .request(MediaType.APPLICATION_JSON_TYPE) .header("app_id", "4985f625") .header("app_key", "aa9e5d2ec3b00306b2d9588c3a25d68e") .post(payload); System.out.println("status: " + response.getStatus()); System.out.println("headers: " + response.getHeaders()); System.out.println("body:" + response.readEntity(String.class)); 的均值和标准差。

我希望我的问题有道理。理解这可能有点先进,特别是因为我仍然认为自己是R的新手。非常感谢任何帮助,如果需要更多信息,请告诉我。我是在Stack Overflow上发布的新手,所以希望我能正确地编译这个问题。

谢谢。

1 个答案:

答案 0 :(得分:0)

考虑通过汇总sd的历史数据来计算 z_score ,然后合并到当前数据并使用ifelse有条件地分配新列。然后可以在aes()颜色框架中使用此新列:

aggdf <- setNames(aggregate(Distance~Player.Name, quartertwo2017, sd),
                  c("Player.Name", "Distance_sd"))

quartertwo2018 <- merge(quartertwo2018, aggdf, by="Player.Name")

quartertwo2018$z_score <- ifelse(quartertwo2018$Distance > (3*sd(quartertwo2018$Distance_sd)),
                                 'high',
                                 ifelse(quartertwo2018$Distance < (3*sd(quartertwo2018$Distance_sd))
                                        & quartertwo2018$Distance > (1*sd(quartertwo2018$Distance_sd)),
                                        'med',
                                        'low'))

plot_TD_Q2 <- ggplot(data = quartertwo2017, 
                     aes(x = Player.Name, y = Distance)) +
  geom_boxplot(fill = "light blue") +
  coord_flip() + ggtitle("Quarter 2") +
  xlab("Player") + ylab("Total Distance") + theme_classic() +

  geom_point(data = quartertwo2018, 
             aes(x = Player.Name, y = Distance, colour = z_score),
             position = position_jitter(width = 0.5),
            cex = 3) +
  # RED, ORANGE/RED, GREEN BY HEX COLOR CODE
  scale_color_manual(values=c("#FF0000", "#FF6600", "#339900")) 

plot_TD_Q2

输出 (看起来类似于你的,因为发布数据中的两个玩家共享红色类别)

Plot Output