示例数据
set.seed(123)
df <- data.frame(loc.id = rep(1:1000, each = 35), year = rep(1980:2014,times = 1000),month.id = sample(c(1:4,8:10,12),35*1000,replace = T))
数据框有一个名为month.id
的变量的1000个位置X 35年的数据,基本上是一年中的月份。对于每年,我想计算每个月的发生百分比。对于例如1980年,
month.vec <- df[df$year == 1980,]
table(month.vec$month.id)
1 2 3 4 8 9 10 12
106 132 116 122 114 130 141 139
计算月份的发生百分比:
table(month.vec$month.id)/length(month.vec$month.id) * 100
1 2 3 4 8 9 10 12
10.6 13.2 11.6 12.2 11.4 13.0 14.1 13.9
我希望有一个像这样的表:
year month percent
1980 1 10.6
1980 2 13.2
1980 3 11.6
1980 4 12.2
1980 5 NA
1980 6 NA
1980 7 NA
1980 8 11.4
1980 9 13
1980 10 14.1
1980 11 NA
1980 12 13.9
由于缺少5,6,7,11个月,我只想在这些月份添加额外的行和NAs。如果可能的话,我愿意 就像这样的dplyr解决方案:
library(dplyr)
df %>% group_by(year) %>% summarise(percentage.contri = table(month.id)/length(month.id)*100)
答案 0 :(得分:4)
使用dplyr
和tidyr
# To get month as integer use (or add as.integer to mutate):
# df$month.id <- as.integer(df$month.id)
library(dplyr)
library(tidyr)
df %>%
group_by(year, month.id) %>%
# Count occurrences per year & month
summarise(n = n()) %>%
# Get percent per month (year number is calculated with sum(n))
mutate(percent = n / sum(n) * 100) %>%
# Fill in missing months
complete(year, month.id = 1:12, fill = list(percent = 0)) %>%
select(year, month.id, percent)
year month.id percent <int> <dbl> <dbl> 1 1980 1.00 10.6 2 1980 2.00 13.2 3 1980 3.00 11.6 4 1980 4.00 12.2 5 1980 5.00 0 6 1980 6.00 0 7 1980 7.00 0 8 1980 8.00 11.4 9 1980 9.00 13.0 10 1980 10.0 14.1 11 1980 11.0 0 12 1980 12.0 13.9
答案 1 :(得分:3)
基础R解决方案:
tab <- table(month.vec$year, factor(month.vec$month.id, levels = 1:12))/length(month.vec$month.id) * 100
dfnew <- as.data.frame(tab)
给出:
> dfnew Var1 Var2 Freq 1 1980 1 10.6 2 1980 2 13.2 3 1980 3 11.6 4 1980 4 12.2 5 1980 5 0.0 6 1980 6 0.0 7 1980 7 0.0 8 1980 8 11.4 9 1980 9 13.0 10 1980 10 14.1 11 1980 11 0.0 12 1980 12 13.9
或data.table
:
library(data.table)
setDT(month.vec)[, .N, by = .(year, month.id)
][.(year = 1980, month.id = 1:12), on = .(year, month.id)
][, N := 100 * N/sum(N, na.rm = TRUE)][]