我有一个包含三个观测值的数据集:一月,二月和三月。我想将剩余的几个月作为零观测值添加到同一数据表中,但是我很难将它们追加。
这是我当前的代码:
library(dplyr)
Period <- c("January 2015", "February 2015", "March 2015",
"January 2016", "February 2016", "March 2016",
"January 2017", "February 2017", "March 2017",
"January 2018", "February 2018", "March 2018")
Month <- c("January", "February", "March",
"January", "February", "March",
"January", "February", "March",
"January", "February", "March")
Dollars <- c(936, 753, 731,
667, 643, 588,
948, 894, 997,
774,745, 684)
dat <- data.frame(Period = Period, Month = Month, Dollars = Dollars)
dat2 <- dat %>%
dplyr::select(Month, Dollars) %>%
dplyr::group_by(Month) %>%
dplyr::summarise(AvgDollars = mean(Dollars))
在数据集中填充四月到十二月的任何想法都将受到赞赏。预先感谢!
答案 0 :(得分:2)
以下是使用complete
进行此操作的一种方法:
library(tidyverse)
然后使用完成:
dat2 <- data.frame(Period = Period, Month = Month, Dollars = Dollars) %>%
# make a "year" variable
mutate(Year = word(Period, 2,2)) %>%
# remove period variable (we'll add it in later)
select(-Period) %>%
# month.name is a base variable listing all months (thanks @Gregor).
# nesting by "Year" lets complete know you only want the years listed in your dataset.
complete(Month = month.name, nesting(Year), fill = list(Dollars = 0)) %>%
# Arrange by Year and month
arrange(Year, Month) %>%
#remake the "period" variable
mutate(Period = paste(Month, Year)) %>%
group_by(Month) %>%
summarise(AvgDollars = mean(Dollars))
答案 1 :(得分:1)
dplyr也许有一个更优雅的解决方案,但是这里是一个无需太多输入的快速解决方案:
dat <- rbind(data.frame(Period = Period, Month = Month, Dollars = Dollars),
data.frame(Period = c(sapply(2015:2018, function(x) format(ISOdate(x,4:12,1),"%B %Y"))),
Month = c(sapply(2015:2018, function(x) format(ISOdate(x,4:12,1),"%B"))),
Dollars = 0))
答案 2 :(得分:1)
这是一个两步解决方案:
library(dplyr)
Sys.setlocale("LC_TIME", "English")
# first, define a dataframe with each month from January 2015 to December 2018
dat2 <- data.frame(Period = format(seq(as.Date("2015/1/1"),
as.Date("2018/12/1"), by = "month"),
format = "%B %Y"),
Month = substr(Period, 1, nchar(Period)-5))
# then, merge dat and dat2
dat %>%
select(Period, Dollars) %>%
right_join(dat2, by = "Period") %>%
select(Period, Month, Dollars)
Period Month Dollars
1 January 2015 January 936
2 February 2015 February 753
3 March 2015 March 731
4 April 2015 January NA
5 May 2015 February NA
6 June 2015 March NA
7 July 2015 January NA
8 August 2015 February NA
9 September 2015 March NA
10 October 2015 January NA
11 November 2015 February NA
12 December 2015 March NA
13 January 2016 January 667
14 February 2016 February 643
15 March 2016 March 588
16 April 2016 January NA
17 May 2016 February NA
18 June 2016 March NA
19 July 2016 January NA
20 August 2016 February NA
21 September 2016 March NA
22 October 2016 January NA
23 November 2016 February NA
24 December 2016 March NA
25 January 2017 January 948
26 February 2017 February 894
27 March 2017 March 997
28 April 2017 January NA
29 May 2017 February NA
30 June 2017 March NA
31 July 2017 January NA
32 August 2017 February NA
33 September 2017 March NA
34 October 2017 January NA
35 November 2017 February NA
36 December 2017 March NA
37 January 2018 January 774
38 February 2018 February 745
39 March 2018 March 684
40 April 2018 January NA
41 May 2018 February NA
42 June 2018 March NA
43 July 2018 January NA
44 August 2018 February NA
45 September 2018 March NA
46 October 2018 January NA
47 November 2018 February NA
48 December 2018 March NA