我的年度数据在年中有价值变化。我想把它变成一个显示价值变化的月度数据。这是我的数据片段。
year value Start date End date
1985 35451 7/1/1985 3/20/1986
1986 45600 3/21/1986 12/23/1986
1987 46089 1/1/1987 10/31/1989
我希望所有的飞蛾都有列和年份的行(如下所示,但在Jun之后没有休息):
Jan Feb Mar Apr May Jun
1985 0 0 0 0 0 0
1986 35451 35451 38725 45600 45600 45600
Jul Aug Sep Oct Nov Dec
1985 35451 35451 35451 35451 35451 35451
1986 45600 45600 45600 45600 45600 45726
1986年3月和12月的月份有加权平均值,因为价值变化发生在该月。
谢谢你,感激不尽。
答案 0 :(得分:1)
您在这里所需要的只是seq.Date
和xtabs
(或您最喜欢的变体),但需要做很多调整才能使其正常工作。在Hadleyverse包中,但如果您愿意,可以在base或data.table
中重写:
library(dplyr)
library(tidyr)
library(lubridate)
# Format dates as dates, then,
df %>% mutate_each(funs(mdy), ends_with('date')) %>%
# evaluating each row separately,
rowwise() %>%
# create a list column with a month-wise sequence of dates for each.
mutate(month = list(seq.Date(Start.date, End.date, by = 'month'))) %>%
# Expand list column to long form,
unnest() %>%
# change year column to year of sequence, not label, and reduce month column to month.abb.
mutate(year = year(month), month = month(month, label = TRUE)) %>%
# For each year-month combination,
group_by(year, month) %>%
# take the mean of values, so each has only one row, then
summarise(value = mean(value)) %>%
# spread the result to wide form.
spread(month, value, fill = 0) # or xtabs(value ~ year + month, data = .)
# Source: local data frame [5 x 13]
# Groups: year [5]
#
# year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1 1985 0 0 0.0 0 0 0 35451 35451 35451 35451 35451 35451
# 2 1986 35451 35451 40525.5 45600 45600 45600 45600 45600 45600 45600 45600 45600
# 3 1987 46089 46089 46089.0 46089 46089 46089 46089 46089 46089 46089 46089 46089
# 4 1988 46089 46089 46089.0 46089 46089 46089 46089 46089 46089 46089 46089 46089
# 5 1989 46089 46089 46089.0 46089 46089 46089 46089 46089 46089 46089 0 0