我有每日数据,我想计算每日测量在一定范围内的每年的次数。数据也按因子分组,因此我需要知道每个因子在一定范围内(例如15到18)每年的天数
我有一个超过100年的大型数据集,但这里只是一些数据:
Date <- seq(as.Date("2010/01/01"), by = "day", length.out = 1095)
People <- sample.int(20, 1095, replace = TRUE)
Country <- sample(x = c("Australia", "Canada", "France"), size = 1095, replace = TRUE)
mydf <- data.frame(Date, People, Country)
我想知道&#34; People&#34;每年的价值。每个国家的年龄在15到18之间。
所以我的输出将是一个新的数据框,如:
myDate People Country
2010 45 Australia
2010 10 Canada
2010 24 France
2011 33 Australia
2011 100 Canada
2011 4 France
2012 21 Australia
2012 66 Canada
2012 211 France
任何帮助都会受到高度赞赏,因为我正在努力解决这个问题并寻找答案,但我无法找到涉及日期和因素的解决方案。
答案 0 :(得分:3)
您可以使用lubridate
和dplyr
来实现这一目标。使用year()
按年份和国家/地区获取年份和组。最后一步是有条件的总结:
library(dplyr)
library(lubridate)
mydf %>%
group_by(year = year(Date), Country) %>%
summarise(p = sum(between(People, 15, 18)))
<小时/> 这可能会产生
year Country p
<dbl> <fct> <int>
1 2010. Australia 22
2 2010. Canada 34
3 2010. France 26
4 2011. Australia 21
5 2011. Canada 30
6 2011. France 13
7 2012. Australia 28
8 2012. Canada 31
9 2012. France 23
答案 1 :(得分:3)
And here is the requisite base solution. Key points: convert dates to character year values with format.Date
and the by-grouping needs to be a list-object:
aggregate( mydf['People'], list(mydf[['Country']], format(mydf$Date, "%Y") ),
FUN=function(d) sum( d >=15 & d <=18) )
Group.1 Group.2 People
1 Australia 2010 25
2 Canada 2010 22
3 France 2010 24
4 Australia 2011 27
5 Canada 2011 19
6 France 2011 33
7 Australia 2012 19
8 Canada 2012 33
9 France 2012 24
If you want the resulting dataframe to have different column names then add those to the list inside the by-group definition:
aggregate( mydf['People'], list(Cntry=mydf[['Country']], Yr=format(mydf$Date, "%Y") ),
function(d) sum( d >=15 & d <=18) )
Cntry Yr People
1 Australia 2010 25
2 Canada 2010 22
3 France 2010 24
4 Australia 2011 27
5 Canada 2011 19
6 France 2011 33
7 Australia 2012 19
8 Canada 2012 33
9 France 2012 24
答案 2 :(得分:2)
对于data.table
解决方案:
library(data.table)
setDT(mydf)[,(People=sum(between(People, 15, 18))), by = .(year(Date), Country)]
year Country V1
1: 2010 Canada 22
2: 2010 Australia 17
3: 2010 France 22
4: 2011 Canada 23
5: 2011 France 22
6: 2011 Australia 26
7: 2012 Canada 21
8: 2012 France 29
9: 2012 Australia 26
答案 3 :(得分:1)
考虑基础R聚合:
mydf$Year <- format(mydf$Date, "%Y")
mydf$NumberTime15_18 <- ifelse(mydf$People >= 15 & mydf$People <= 18, 1, 0)
aggregate(NumberTime15_18 ~ Country + Year, mydf, sum)
# Country Year NumberTime15_18
# 1 Australia 2010 22
# 2 Canada 2010 17
# 3 France 2010 28
# 4 Australia 2011 26
# 5 Canada 2011 24
# 6 France 2011 20
# 7 Australia 2012 16
# 8 Canada 2012 27
# 9 France 2012 21