根据日期计算不同列的平均值

时间:2020-03-13 18:34:34

标签: r date mean

我的数据集是关于森林大火和NDVI值(该值介于0到1之间,表示表面的绿色程度)。它有一个第一列,该行表示第一行的森林大火发生的时间,随后的各列指示火灾发生前后不同日期的NDVI值。火灾前的NDVI值比火灾后的NDVI值高得多。像这样:

data1989 <- data.frame("date_fire" = c("1987-01-01", "1987-07-03", "1988-01-01"), 
                       "1986-01-01" = c(0.5, 0.589, 0.66), 
                       "1986-06-03" = c(0.56, 0.447, 0.75), 
                       "1986-10-19" = c(0.8, NA, 0.83),
                       "1987-01-19" = c(0.75, 0.65,0.75), 
                       "1987-06-19" = c(0.1, 0.55,0.811),
                       "1987-10-19" = c(0.15, 0.12, 0.780),
                       "1988-01-19" = c(0.2, 0.22,0.32), 
                       "1988-06-19" = c(0.18, 0.21,0.23),
                       "1988-10-19" = c(0.21, 0.24, 0.250),
                       stringsAsFactors = FALSE) 
> data1989
   date_fire X1986.01.01 X1986.06.03 X1986.10.19 X1987.01.19 X1987.06.19 X1987.10.19 X1988.01.19 X1988.06.19 X1988.10.19
1 1987-01-01       0.500       0.560        0.80        0.75       0.100        0.15        0.20        0.18        0.21
2 1987-07-03       0.589       0.447          NA        0.65       0.550        0.12        0.22        0.21        0.24
3 1988-01-01       0.660       0.750        0.83        0.75       0.811        0.78        0.32        0.23        0.25

我想在森林火灾发生之前的新列中计算NDVI值的平均值。在第一种情况下,它是第2、3、4和5列的平均值。

我需要得到的是:

date_fire    X1986.01.01 X1986.06.03 X1986.10.19 X1987.01.19 X1987.06.19 X1987.10.19 X1988.01.19 X1988.06.19 X1988.10.19 meanPreFire
1 1987-01-01       0.500       0.560        0.80        0.75       0.100        0.15        0.20        0.18        0.21       0.653
2 1987-07-03       0.589       0.447          NA        0.65       0.550        0.12        0.22        0.21        0.24       0.559
3 1988-01-01       0.660       0.750        0.83        0.75       0.811        0.78        0.32        0.23        0.25       0.764

谢谢!

编辑:解决方案

如何使用多于一列的代码来排除:

   data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"), 
                       "type" = c("oak", "pine", "oak"),
                       "meanRainfall" = c(600, 300, 450),
                       "1986.01.01" = c(0.5, 0.589, 0.66), 
                       "1986.06.03" = c(0.56, 0.447, 0.75), 
                       "1986.10.19" = c(0.8, NA, 0.83),
                       "1987.01.19" = c(0.75, 0.65,0.75), 
                       "1987.06.19" = c(0.1, 0.55,0.811),
                       "1987.10.19" = c(0.15, 0.12, 0.780),
                       "1988.01.19" = c(0.2, 0.22,0.32), 
                       "1988.06.19" = c(0.18, 0.21,0.23),
                       "1988.10.19" = c(0.21, 0.24, 0.250),
                       check.names = FALSE,
                       stringsAsFactors = FALSE)

使用:

j1 <- findInterval(as.Date(data1989$date_fire), as.Date(names(data1989)[-(1:3)],format="%Y.%m.%d"))
m1 <- cbind(rep(seq_len(nrow(data1989)), j1), sequence(j1))
data1989$meanPreFire <- tapply(data1989[-(1:3)][m1], m1[,1], FUN = mean, na.rm = TRUE)

> data1989
   date_fire type meanRainfall 1986.01.01 1986.06.03 1986.10.19 1987.01.19 1987.06.19 1987.10.19 1988.01.19 1988.06.19 1988.10.19 meanPreFire
1 1987-02-01  oak          600      0.500      0.560       0.80       0.75      0.100       0.15       0.20       0.18       0.21      0.6525
2 1987-07-03 pine          300      0.589      0.447         NA       0.65      0.550       0.12       0.22       0.21       0.24      0.5590
3 1988-01-01  oak          450      0.660      0.750       0.83       0.75      0.811       0.78       0.32       0.23       0.25      0.7635

3 个答案:

答案 0 :(得分:3)

将数据重整为长格式并在森林大火发生之前过滤日期。

library(tidyverse)

data1989 %>%
  pivot_longer(-date_fire, names_to = "date") %>%
  mutate(date_fire = as.Date(date_fire),
         date = as.Date(date, "X%Y.%m.%d")) %>%
  filter(date < date_fire) %>%
  group_by(date_fire) %>%
  summarise(meanPreFire = mean(value, na.rm = T))

# # A tibble: 3 x 2
#   date_fire  meanPreFire
#   <date>           <dbl>
# 1 1987-01-01       0.62 
# 2 1987-07-03       0.559
# 3 1988-01-01       0.764

答案 1 :(得分:2)

我们可以通过创建行/列索引来使用Private Sub btnBuscar4_Click() Const DATA = "C:\Users\Bonito\Desktop\Plataforma\Datos.xlsm" 'declarar las variables Dim rngToFilter As Range Dim FindRow As Range Dim LastRow As Integer Dim cRow As String Dim Datos As Worksheet Set Datos = Workbooks.Open(DATA).Worksheets("Datos") 'Aplica la liberaci?n de las hojas para consultarlas 'SheetProtection 'Si hay filtros, los elimina de la hoja Datos If ActiveSheet.AutoFilterMode Then ActiveSheet.AutoFilterMode = False 'Windows("Datos.xlsm").Visible = False 'Hace que no se muestre el excel externo (Datos) 'Makes external excel not show (Data) 'hold in memory and stop screen flicker 'Application.ScreenUpdating = False If Me.bLeg3 <> "" And Me.bApe3 <> "" Then ' Please, enter a File or a Last Name MsgBox "Por favor, ingresar un Legajo o un Apellido" Exit Sub End If 'error block On Error GoTo errHandler: 'Filtrar solo por Legajo If Me.bLeg3 <> "" Then 'Guardar el legajo en una variable cRow = Me.bLeg3.Value LastRow = Sheets("Datos").Range("A" & Rows.Count).End(xlUp).Row Set rngToFilter = Worksheets("Datos").Range("A1:A" & LastRow) 'Filtrar solo por Apellido ElseIf Me.bApe3 <> "" Then 'Encontrar la fila con la data cRow = Me.bApe3.Value LastRow = Sheets("Datos").Range("B" & Rows.Count).End(xlUp).Row Set rngToFilter = Worksheets("Datos").Range("B1:B" & LastRow) End If ' count filtered rows rngToFilter.AutoFilter Field:=1, Criteria1:=cRow Reg2.Value = rngToFilter.SpecialCells(xlCellTypeVisible).Cells.Count - 1 Set FindRow = rngToFilter.Find(What:=cRow, LookIn:=xlValues) Me.CurrentAddress = FindRow.Address 'te trae la celda actual 'agregar los valores a las casillas correspondientes Call SheetToForm(FindRow) 'error block On Error GoTo 0 Exit Sub errHandler: ' Verify the data entered, because they are not correct MsgBox "Error! Verificar los datos ingresados, porque no son correctos!" & vbCrLf & Err.Description End Sub Sub SheetToForm(rng As Range) Dim map As Variant, i As Integer map = Array(0, "Leg3", 1, "Ape3", 2, "Nomb3", 3, "Pues3", _ 4, "Fech3", 5, "ComboLiqui3", 6, "FechaDesde3", 7, "FechaHasta3", _ 8, "Cant3", 9, "Obs3", 12, "Dia3", 13, "Dia4") For i = LBound(map) To UBound(map) Step 2 Me.Controls(map(i + 1)).Value = rng.Columns(1).Offset(0, map(i)) Next Me.CurrentAddress = rng.Address 'te trae la celda actual End Sub 。可以从base R获取列索引,其中包含列名称和'date_fire'

findInterval

或使用j1 <- findInterval(as.Date(data1989$date_fire), as.Date(names(data1989)[-1])) l1 <- lapply(j1+1, `:`, ncol(data1989)-1) m1 <- cbind(rep(seq_len(nrow(data1989)), j1), sequence(j1)) m2 <- cbind(rep(seq_len(nrow(data1989)), lengths(l1)), unlist(l1)) data1989$meanPreFire <- tapply(data1989[-1][m1], m1[,1], FUN = mean, na.rm = TRUE) data1989$meanPostFire <- tapply(data1989[-1][m2], m2[,1], FUN = mean, na.rm = TRUE) data1989 # date_fire 1986-01-01 1986-06-03 1986-10-19 1987-01-19 1987-06-19 1987-10-19 1988-01-19 1988-06-19 1988-10-19 #1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21 #2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24 #3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25 # meanPreFire meanPostFire #1 0.6200 0.2650000 #2 0.5590 0.1975000 #3 0.7635 0.2666667 中的melt/dcast

data.table

数据

library(data.table)
dcast(melt(setDT(data1989), id.var = 'date_fire')[, 
    .(value = mean(value, na.rm = TRUE)), 
    .(date_fire, grp = c('postFire', 'preFire')[1 + (as.IDate(variable) < as.IDate(date_fire))]) ], date_fire ~ grp)[data1989, on = .(date_fire)]
#    date_fire  postFire preFire 1986-01-01 1986-06-03 1986-10-19 1987-01-19 1987-06-19 1987-10-19 1988-01-19 1988-06-19
#1: 1987-01-01 0.2650000  0.6200      0.500      0.560       0.80       0.75      0.100       0.15       0.20       0.18
#2: 1987-07-03 0.1975000  0.5590      0.589      0.447         NA       0.65      0.550       0.12       0.22       0.21
#3: 1988-01-01 0.2666667  0.7635      0.660      0.750       0.83       0.75      0.811       0.78       0.32       0.23
#   1988-10-19
#1:       0.21
#2:       0.24
#3:       0.25

答案 2 :(得分:2)

如果我们将数据保留为长格式,则解决方案会更加简洁...但这会重现所需的输出:

library(dplyr)
library(tidyr)
data1989 %>% 
  pivot_longer(-date_fire, names_to = "date_NDVI", values_to = "value", names_prefix = "^X") %>% 
  mutate(date_fire = as.Date(date_fire, "%Y-%m-%d"),
         date_NDVI = as.Date(date_NDVI, "%Y.%m.%d")) %>% 
  group_by(date_fire) %>% 
  mutate(period = ifelse(date_NDVI < date_fire, "before_fire", "after_fire")) %>% 
  group_by(date_fire, period) %>% 
  mutate(average_NDVI = mean(value, na.rm = TRUE)) %>% 
  pivot_wider(names_from = date_NDVI,  names_prefix = "X", values_from = value) %>% 
  pivot_wider(names_from = period, values_from = average_NDVI) %>% 
  group_by(date_fire) %>% 
  summarise_all(funs(sum(., na.rm=T)))

返回:

# A tibble: 3 x 12
  date_fire  `X1986-01-01` `X1986-06-03` `X1986-10-19` `X1987-01-19` `X1987-06-19` `X1987-10-19` `X1988-01-19` `X1988-06-19` `X1988-10-19` before_fire after_fire
  <date>             <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>       <dbl>      <dbl>
1 1987-01-01         0.5           0.56           0.8           0.75         0.1            0.15          0.2           0.18          0.21       0.62       0.265
2 1987-07-03         0.589         0.447          0             0.65         0.55           0.12          0.22          0.21          0.24       0.559      0.198
3 1988-01-01         0.66          0.75           0.83          0.75         0.811          0.78          0.32          0.23          0.25       0.764      0.267

编辑:

如果我们在计算平均值后立即停止表达式,则可以使用此结构中的数据轻松计算方差或解释观察次数的变化。我认为可以将date_fire保留为自己的列,但我建议将其他日期保留为一列(因为它们与观察值相对应)。尤其是如果我们想使用ggplot2和其他tidyverse函数对数据进行更多分析。