在R中每月重新采样数据

时间:2020-03-06 16:57:57

标签: r dplyr data.table tidyverse zoo

我下面有一个很大的csv文件示例,

> data <- fread('data.csv', sep = ",")
> data
              name year value       
    1: Afghanistan 1800   11
    2:     Albania 1800   22
    3:     Algeria 1800   6
    4: Afghanistan 1801   48
    5:     Albania 1801   60
    6:     Algeria 1801   120 

   ---                                  
46509: Afghanistan 2040   108
46510:     Albania 2040   72
46511:     Algeria 2040   36


我的目标是将该数据重新采样到每月和内插值列,如下所示(阿富汗1800)


              name      year    value       
    1: Afghanistan     Jan 1800   1
    1: Afghanistan     Feb 1800   2
    1: Afghanistan     Mar 1800   3
    1: Afghanistan     May 1800   4
    1: Afghanistan     Jun 1800   5
    1: Afghanistan     Jul 1800   6
    1: Afghanistan     Aug 1800   7
    1: Afghanistan     Sep 1800   8
    1: Afghanistan     Oct 1800   9
    1: Afghanistan     Nov 1800   10
    1: Afghanistan     Dec 1800   11
    2:     Albania     Jan 1800   2



   ---                                  
46509: Afghanistan         2040   108
46510:     Albania         2040   72
46511:     Algeria         2040   36

我尝试了几种选择但均未成功,最近的选择如下所示,

> data <- as.zoo(data)
> m <- na.approx(data(time(data), 0:11/12, "+"))
Error in approx(x[!na], y[!na], xout, ...) : 
  need at least two non-NA values to interpolate
In addition: Warning messages:
1: In data(time(data), 0:11/12, "+") : data set ‘time(data)’ not found
2: In data(time(data), 0:11/12, "+") : data set ‘0:11/12’ not found
3: In data(time(data), 0:11/12, "+") : data set ‘+’ not found
4: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
> head(m)
           Afghanistan Albania  Algeria 
1800-01-31      11     24          6
1800-02-28      11     24          6
1800-03-31      11     24          6 
1800-04-30      11     24          6
1800-05-31      11     24          6
1800-06-30      11     24          6

关于如何达到我想要的结果的想法?

2 个答案:

答案 0 :(得分:0)

我不能完全确定这是您要寻找的东西,请让我知道这是否更接近您的想法。

library(data.table)
library(zoo)

df <- data.frame(
  name = c("Afghanistan", "Albania", "Algeria", "Afghanistan", "Albania", "Algeria"),
  year = c(1800, 1800, 1800, 1801, 1801, 1801),
  value = c(11, 22, 6, 48, 60, 120),
  month = 1
)

cols <- c("month", "value")

res <- setDT(df)[, .SD[match(1:12, month)], by = .(name, year)]
res[, month := seq(.N), by = .(name, year)]
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols, by = name]

输出

           name year     value month
 1: Afghanistan 1800  11.00000     1
 2: Afghanistan 1800  14.08333     2
 3: Afghanistan 1800  17.16667     3
 4: Afghanistan 1800  20.25000     4
 5: Afghanistan 1800  23.33333     5
 6: Afghanistan 1800  26.41667     6
 7: Afghanistan 1800  29.50000     7
 8: Afghanistan 1800  32.58333     8
 9: Afghanistan 1800  35.66667     9
10: Afghanistan 1800  38.75000    10
11: Afghanistan 1800  41.83333    11
12: Afghanistan 1800  44.91667    12
13:     Albania 1800  22.00000     1
14:     Albania 1800  25.16667     2
15:     Albania 1800  28.33333     3
16:     Albania 1800  31.50000     4
17:     Albania 1800  34.66667     5
18:     Albania 1800  37.83333     6
19:     Albania 1800  41.00000     7
20:     Albania 1800  44.16667     8
21:     Albania 1800  47.33333     9
22:     Albania 1800  50.50000    10
23:     Albania 1800  53.66667    11
24:     Albania 1800  56.83333    12
25:     Algeria 1800   6.00000     1
26:     Algeria 1800  15.50000     2
27:     Algeria 1800  25.00000     3
28:     Algeria 1800  34.50000     4
29:     Algeria 1800  44.00000     5
30:     Algeria 1800  53.50000     6
31:     Algeria 1800  63.00000     7
32:     Algeria 1800  72.50000     8
33:     Algeria 1800  82.00000     9
34:     Algeria 1800  91.50000    10
35:     Algeria 1800 101.00000    11
36:     Algeria 1800 110.50000    12
37: Afghanistan 1801  48.00000     1
...

数据

df <- data.frame(
  name = c("Afghanistan", "Albania", "Algeria", "Afghanistan", "Albania", "Algeria"),
  year = c(1800, 1800, 1800, 1801, 1801, 1801),
  value = c(11, 22, 6, 48, 60, 120),
  month = 1
)

答案 1 :(得分:-1)

我将执行以下操作:

 library(tidyverse)

 data %>%
    arrange(name, value) %>%
    select(name, year, value)