使用不同的列计算值之间的差异,并使用R计算带间隙的值之间的差异

时间:2019-03-01 02:12:04

标签: r group-by tidyverse mutate

有人可以帮我弄清楚如何根据我的月度数据来计算值的差异吗?例如,我想计算每口井每年一月至七月,二月至八月,三月至九月等之间的地下水值差异。请注意,在某些年份中可能会缺少几个月。任何tidyverse解决方案都将受到赞赏。

    Well  year month     value
   <dbl> <dbl> <fct>     <dbl>
 1   222  1995 February   8.53
 2   222  1995 March      8.69
 3   222  1995 April      8.92
 4   222  1995 May        9.59
 5   222  1995 June       9.59
 6   222  1995 July       9.70
 7   222  1995 August     9.66
 8   222  1995 September  9.46
 9   222  1995 October    9.49
10   222  1995 November   9.31
# ... with 18,400 more rows

df1 <- subset(df, month %in% c("February", "August"))
test <- df1 %>% 
  dcast(site + year + Well ~ month, value.var = "value") %>%
  mutate(Diff = February - August)

谢谢

西蒙(Simon)

1 个答案:

答案 0 :(得分:1)

因此,我尝试制造一个数据集并使用dplyr创建解决方案。最佳做法是包括一种生成样本数据集的方法,因此请在以后的问题中这样做。

# load required library
library(dplyr)

# generate data set of all site, well, and month combinations
## define valid values
sites = letters[1:3]
wells = 1:5
months = month.name

## perform a series of merges 
full_sites_wells_months_set <- 
    merge(sites, wells) %>%
    dplyr::rename(sites = x, wells = y) %>% # this line and the prior could be replaced on your system with initial_tibble %>% dplyr::select(sites, wells) %>% unique() 
    merge(months) %>% 
    dplyr::rename(months = y) %>% 
    dplyr::arrange(sites, wells)

# create sample initial_tibble
## define fraction of records to simulate missing months
data_availability <- 0.8

initial_tibble <- 
    full_sites_wells_months_set %>% 
    dplyr::sample_frac(data_availability) %>% 
    dplyr::mutate(values = runif(nrow(full_sites_wells_months_set)*data_availability)) # generate random groundwater values

# generate final result by joining full expected set of sites, wells, and months to actual data, then group by sites and wells and perform lag subtraction
final_tibble <- 
    full_sites_wells_months_set %>% 
    dplyr::left_join(initial_tibble) %>% 
    dplyr::group_by(sites, wells) %>% 
    dplyr::mutate(trailing_difference_6_months = values - dplyr::lag(values, 6L))