我在R中具有以下数据框:
Year ID
1 2018 x
2 2018 x
3 2018 y
4 2018 z
5 2019 x
6 2019 x
7 2019 z
我想分别计算每年在“ ID”列中总观测值中“ x”的份额。
结果应如下所示:
Year Share of x
2018 50 %
2019 67 %
是否可以用aggregate
来做到这一点,
aggregate(length(which(df$ID == x)) / length(df$ID), by=Year)
或其他任何功能?
答案 0 :(得分:1)
假设最后在“注释”中可重复显示的数据使用table
计算计数,然后使用prop.table
计算每个数据占其行的比例。
prop.table(table(dat), 1)
## ID
## Year x y z
## 2018 0.5000000 0.2500000 0.2500000
## 2019 0.6666667 0.0000000 0.3333333
或者如果您希望各列的比例:
prop.table(table(dat), 2)
## ID
## Year x y z
## 2018 0.5 1.0 0.5
## 2019 0.5 0.0 0.5
关于问题上的aggregate
标签,第一种情况可以这样进行:
aggregate(ID ~ Year, dat,
function(id) sapply(unique(dat$ID), function(x) setNames(mean(id == x), x)))
## Year ID.x ID.y ID.z
## 1 2018 0.5000000 0.2500000 0.2500000
## 2 2019 0.6666667 0.0000000 0.3333333
或同时使用aggregate
和table
:
aggregate(ID ~ Year, dat, function(x) table(x) / length(x))
## Year ID.x ID.y ID.z
## 1 2018 0.5000000 0.25 0.2500000
## 2 2019 0.6666667 0.00 0.3333333
library(dplyr)
library(tidyr)
dat %>%
count(Year, ID) %>%
group_by(Year) %>%
mutate(prop = n / sum(n)) %>%
pivot_wider(-n, names_from = "ID", values_from = "prop", values_fill = list(prop = 0))
## # A tibble: 2 x 4
## # Groups: Year [2]
## Year x y z
## <int> <dbl> <dbl> <dbl>
## 1 2018 0.5 0.25 0.25
## 2 2019 0.667 0 0.333
Lines <- " Year ID
1 2018 x
2 2018 x
3 2018 y
4 2018 z
5 2019 x
6 2019 x
7 2019 z "
dat <- read.table(text = Lines)
答案 1 :(得分:0)
也许你想这样做
dfout<- setNames(aggregate(ID~Year,df,function(v) sum(v=="x")/length(v)*100),
c("Year","Share of x"))
如此
> dfout
Year Share of x
1 2018 50.00000
2 2019 66.66667
数据
df <-structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2019L, 2019L,
2019L), ID = c("x", "x", "y", "z", "x", "x", "z")), class = "data.frame", row.names = c(NA,
-7L))
答案 2 :(得分:0)
Tidyverse方法:
library(tidyverse)
data<- tribble(~year,~id,
2018,"x",
2018,"x",
2018,"y",
2018,"z",
2019,"x",
2019,"x",
2019,"z"
)
agg <- data %>% group_by(year,id) %>%
summarise(cnt_id = n()) %>% # count id per year
group_by(year) %>%
mutate(cnt_obs = sum(cnt_id),#count total obs per year
share = cnt_id/cnt_obs) %>%
filter(id=="x") %>%
select(year,id,share)
head(agg)
year id share
<dbl> <chr> <dbl>
1 2018 x 0.5
2 2019 x 0.667
答案 3 :(得分:0)
我认为2019y缺失了,但仍然
library(tidyverse)
df<- tribble(~year,~id,
2018,"x",
2018,"x",
2018,"y",
2018,"z",
2019,"x",
2019,"x",
2019,"z"
)
df %>%
group_by(year,id) %>%
tally() %>%
group_by(year) %>%
mutate(prop = n/sum(n)) %>%
ungroup() %>%
select(-n) %>%
pivot_wider(names_from = id,values_from = prop) %>%
mutate_all(~ replace_na(.,replace = 0))