按两列分组,然后计算中位数

时间:2020-10-16 02:50:45

标签: r dplyr

我有一个数据集,我想在其中计算每个样地每个起源(本地和外来)的第一朵花的中位数。

我的最终目标是测试在暖和的环境下,本土和外来物种的第一朵花的中位日期是否存在显着差异。

这是我的数据的子集:

dput(umbs_firstflower[8:16,])
structure(list(site = c("umbs", "umbs", "umbs", "umbs", "umbs", 
"umbs", "umbs", "umbs", "umbs"), plot = c("A1", "A1", "A1", "A2", 
"A2", "A2", "A2", "A3", "A3"), species = c("Sogi", "Sone", "Syla", 
"Cest", "Poco", "Popr", "Ruac", "Cest", "Dasp"), origin = c("Native", 
"Native", "Native", "Exotic", "Exotic", "Exotic", "Exotic", "Exotic", 
"Native"), state = c("ambient", "ambient", "ambient", "warmed", 
"warmed", "warmed", "warmed", "ambient", "ambient"), first.flower = c("248", 
"240", "227", "195", "169", "155", "156", "194", "185")), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

这是我为尝试执行此操作而编写的代码示例:

umbs <- umbs_firstflower %>% group_by(plot, origin) %>% summarize(mean.firstflw = mean(as.numeric(date))) %>% ungroup()

1 个答案:

答案 0 :(得分:0)

您可能会在这里使用许多重要性测试,因此,我将使用一个(kruskal.test())来演示解决方案。但是请注意,there is disagreement是测试3个以上组中位数之间显着差异的最佳方法,因此您可能希望将此测试换成另一组。

步骤:

  1. 创建一个grp变量,该变量与分类列中感兴趣的各种组合匹配。
  2. pivot_wider(),其中各组作为first.flower值的列。
library(tidyverse)
library(magrittr)

df_wide <- 
  df %>%
  mutate(first.flower = as.numeric(first.flower),
         grp = case_when(
           origin == "Native" & state == "ambient" ~ "nat_ambi",
           origin == "Native" & state == "warmed" ~ "nat_warm",
           origin == "Exotic" & state == "ambient" ~ "exo_ambi",
           origin == "Exotic" & state == "warmed" ~ "nat_warm",
           TRUE ~ NA_character_
         )) %>%
  pivot_wider(id_cols = 1:5, names_from = grp, values_from = "first.flower")

df_wide
# A tibble: 9 x 8
  site  plot  species origin state   nat_ambi nat_warm exo_ambi
  <chr> <chr> <chr>   <chr>  <chr>      <dbl>    <dbl>    <dbl>
1 umbs  A1    Sogi    Native ambient      248       NA       NA
2 umbs  A1    Sone    Native ambient      240       NA       NA
3 umbs  A1    Syla    Native ambient      227       NA       NA
4 umbs  A2    Cest    Exotic warmed        NA      195       NA
5 umbs  A2    Poco    Exotic warmed        NA      169       NA
6 umbs  A2    Popr    Exotic warmed        NA      155       NA
7 umbs  A2    Ruac    Exotic warmed        NA      156       NA
8 umbs  A3    Cest    Exotic ambient       NA       NA      194
9 umbs  A3    Dasp    Native ambient      185       NA       NA
  1. 使用%$% magrittr管道直接运行重要性测试。
 df_wide %$% kruskal.test(list(nat_ambi, nat_warm, exo_ambi))

    Kruskal-Wallis rank sum test

data:  list(nat_ambi, nat_warm, exo_ambi)
Kruskal-Wallis chi-squared = 4.2667, df = 2, p-value = 0.1184