我下面有一个很大的csv文件示例,
> data <- fread('data.csv', sep = ",")
> data
name year value
1: Afghanistan 1800 11
2: Albania 1800 22
3: Algeria 1800 6
4: Afghanistan 1801 48
5: Albania 1801 60
6: Algeria 1801 120
---
46509: Afghanistan 2040 108
46510: Albania 2040 72
46511: Algeria 2040 36
我的目标是将该数据重新采样到每月和内插值列,如下所示(阿富汗1800)
name year value
1: Afghanistan Jan 1800 1
1: Afghanistan Feb 1800 2
1: Afghanistan Mar 1800 3
1: Afghanistan May 1800 4
1: Afghanistan Jun 1800 5
1: Afghanistan Jul 1800 6
1: Afghanistan Aug 1800 7
1: Afghanistan Sep 1800 8
1: Afghanistan Oct 1800 9
1: Afghanistan Nov 1800 10
1: Afghanistan Dec 1800 11
2: Albania Jan 1800 2
---
46509: Afghanistan 2040 108
46510: Albania 2040 72
46511: Algeria 2040 36
我尝试了几种选择但均未成功,最近的选择如下所示,
> data <- as.zoo(data)
> m <- na.approx(data(time(data), 0:11/12, "+"))
Error in approx(x[!na], y[!na], xout, ...) :
need at least two non-NA values to interpolate
In addition: Warning messages:
1: In data(time(data), 0:11/12, "+") : data set ‘time(data)’ not found
2: In data(time(data), 0:11/12, "+") : data set ‘0:11/12’ not found
3: In data(time(data), 0:11/12, "+") : data set ‘+’ not found
4: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
> head(m)
Afghanistan Albania Algeria
1800-01-31 11 24 6
1800-02-28 11 24 6
1800-03-31 11 24 6
1800-04-30 11 24 6
1800-05-31 11 24 6
1800-06-30 11 24 6
关于如何达到我想要的结果的想法?
答案 0 :(得分:0)
我不能完全确定这是您要寻找的东西,请让我知道这是否更接近您的想法。
library(data.table)
library(zoo)
df <- data.frame(
name = c("Afghanistan", "Albania", "Algeria", "Afghanistan", "Albania", "Algeria"),
year = c(1800, 1800, 1800, 1801, 1801, 1801),
value = c(11, 22, 6, 48, 60, 120),
month = 1
)
cols <- c("month", "value")
res <- setDT(df)[, .SD[match(1:12, month)], by = .(name, year)]
res[, month := seq(.N), by = .(name, year)]
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols, by = name]
输出
name year value month
1: Afghanistan 1800 11.00000 1
2: Afghanistan 1800 14.08333 2
3: Afghanistan 1800 17.16667 3
4: Afghanistan 1800 20.25000 4
5: Afghanistan 1800 23.33333 5
6: Afghanistan 1800 26.41667 6
7: Afghanistan 1800 29.50000 7
8: Afghanistan 1800 32.58333 8
9: Afghanistan 1800 35.66667 9
10: Afghanistan 1800 38.75000 10
11: Afghanistan 1800 41.83333 11
12: Afghanistan 1800 44.91667 12
13: Albania 1800 22.00000 1
14: Albania 1800 25.16667 2
15: Albania 1800 28.33333 3
16: Albania 1800 31.50000 4
17: Albania 1800 34.66667 5
18: Albania 1800 37.83333 6
19: Albania 1800 41.00000 7
20: Albania 1800 44.16667 8
21: Albania 1800 47.33333 9
22: Albania 1800 50.50000 10
23: Albania 1800 53.66667 11
24: Albania 1800 56.83333 12
25: Algeria 1800 6.00000 1
26: Algeria 1800 15.50000 2
27: Algeria 1800 25.00000 3
28: Algeria 1800 34.50000 4
29: Algeria 1800 44.00000 5
30: Algeria 1800 53.50000 6
31: Algeria 1800 63.00000 7
32: Algeria 1800 72.50000 8
33: Algeria 1800 82.00000 9
34: Algeria 1800 91.50000 10
35: Algeria 1800 101.00000 11
36: Algeria 1800 110.50000 12
37: Afghanistan 1801 48.00000 1
...
数据
df <- data.frame(
name = c("Afghanistan", "Albania", "Algeria", "Afghanistan", "Albania", "Algeria"),
year = c(1800, 1800, 1800, 1801, 1801, 1801),
value = c(11, 22, 6, 48, 60, 120),
month = 1
)
答案 1 :(得分:-1)
我将执行以下操作:
library(tidyverse)
data %>%
arrange(name, value) %>%
select(name, year, value)