r-计算数据框的平均值,该数据框的一列中包含参数名称,另一列中包含浓度值

时间:2018-07-31 15:29:42

标签: r dataframe dplyr tidyr

我知道我可能错过了这么简单的内容,但是在处理整个数据方面却有很大的困难!我想创建一个如下表:

enter image description here

我有一个数据框,其中包含多个不同的位置,参数以及每个参数的浓度值。我很困惑的是如何计算每个参数的平均值,因为在我拥有的数据框中,所有参数都列在一列中,然后值在不同的列中。我如何做到这一点?我将不胜感激。

数据吞吐量:

 dput(head(df_trib_filtered,10))
structure(list(NJPDES = c("NJ0020206", "NJ0020532", "NJ0022021", 
"NJ0022985", "NJ0023361", "NJ0023736", "NJ0024015", "NJ0024031", 
"NJ0024040", "NJ0024678"), Facility_Name = c("ALLENTOWN BORO WWTP", 
"HARRISON TWP MULLICA HILL WWTP", "SWEDESBORO WTP", "WRIGHTSTOWN BOROUGH STP", 
"WILLINGBORO WATER POLLUTION CONTROL PLANT", "PINELANDS WASTEWATER CO", 
"MOUNT HOLLY WPCF", "ELMWOOD WTP", "WOODSTREAM STP", "BORDENTOWN SA BLACK'S CREEK STP"
), `Monitored Location Designator` = c("001A", "001A", "001A", 
"001A", "001A", "001A", "001A", "001A", "001A", "001A"), Date = structure(c(17378, 
17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378, 17378
), class = "Date"), Parameter_Number_DMR = c("00300", "00300", 
"00300", "00300", "00300", "00300", "00300", "00300", "00300", 
"00300"), Parameter = c("Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", 
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", 
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", 
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)"), `Sample Point Description` = c("Effluent Gross Value", 
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value", 
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value", 
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value"
), Rep_Val_Quantity_Avg = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), X__1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `Reported Value Quantity Maximum` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), `Quantity Units Description` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), Rep_Val_Con_Min = c("7.44", 
NA, "6.07", NA, NA, "6.6", NA, "6.5", NA, "6.89"), Val_Con_AVG = c(7.58, 
7, 6.09, 6.9, 7.58, 6.5, 7.9, 6.5, 6.8, 6.99), Rep_Val_Con_Max = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    valunit = c("MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", 
    "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", 
    "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", 
    "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER")), .Names = c("NJPDES", 
"Facility_Name", "Monitored Location Designator", "Date", "Parameter_Number_DMR", 
"Parameter", "Sample Point Description", "Rep_Val_Quantity_Avg", 
"X__1", "Reported Value Quantity Maximum", "Quantity Units Description", 
"Rep_Val_Con_Min", "Val_Con_AVG", "Rep_Val_Con_Max", "valunit"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

我到目前为止的代码:

### Get dataframe to have parameters in their own column ###
data_tidy_trib <- df_trib_filtered %>%
  spread(Parameter,Val_Con_AVG)

使用点差完成我想要的东西是否正确?当我进行价差交易时,每个新列中都有带有参数的NAs ...所以我想我做错了吗?

1 个答案:

答案 0 :(得分:1)

根据您首先需要运行的注释

detach(plyr)

比使用:

library(dplyr)
df_trib_filtered %>%
  dplyr::group_by(Facility_Name, Parameter) %>% 
  dplyr::summarise(Average = mean(Val_Con_AVG, na.rm = TRUE))

输出应为

# A tibble: 10 x 3
# Groups:   Facility_Name [?]
#    Facility_Name                             Parameter              Average
#    <chr>                                     <chr>                    <dbl>
#  1 ALLENTOWN BORO WWTP                       Oxygen, Dissolved (DO)    7.58
#  2 BORDENTOWN SA BLACK'S CREEK STP           Oxygen, Dissolved (DO)    6.99
#  3 ELMWOOD WTP                               Oxygen, Dissolved (DO)    6.5 
#  4 HARRISON TWP MULLICA HILL WWTP            Oxygen, Dissolved (DO)    7   
#  5 MOUNT HOLLY WPCF                          Oxygen, Dissolved (DO)    7.9 
#  6 PINELANDS WASTEWATER CO                   Oxygen, Dissolved (DO)    6.5 
#  7 SWEDESBORO WTP                            Oxygen, Dissolved (DO)    6.09
#  8 WILLINGBORO WATER POLLUTION CONTROL PLANT Oxygen, Dissolved (DO)    7.58
#  9 WOODSTREAM STP                            Oxygen, Dissolved (DO)    6.8 
# 10 WRIGHTSTOWN BOROUGH STP                   Oxygen, Dissolved (DO)    6.9