我有一个看起来像这样的数据框:
group <- c('a', 'b', 'a', 'b')
year <- c(1990, 1990, 2000, 2000)
freq <- c(100, 120, 130, 170)
df <- data.frame(group, year, freq)
对于每个不同的年份,我想找到a
组的行的频率值除以b
组的行的频率值,然后将这些比例值添加到数据框中。结果数据框应如下所示:
group <- c('a', 'b', 'c', 'a', 'b', 'c')
year <- c(1990, 1990, 1990, 2000, 2000, 2000)
freq <- c(100, 120, 100/120, 130, 170, 130/170)
df <- data.frame(group, year, freq)
我试图通过下面最糟糕的循环来解决这个问题,但使火车脱离了轨道。如果有人可以帮助我展示如何在R中完成此基本任务,我将不胜感激!
for (year in unique(df$year)) {
a = df[ which(df$group == 'a' & df$year == year), ]
b = df[ which(df$group == 'b' & df$year == year), ]
proportion = a$freq / b$freq
row = c('c', year, proportion)
rbind(df, row)
}
答案 0 :(得分:3)
这是一个tidyverse
选项
library(tidyverse)
df %>%
spread(group, freq) %>%
mutate(c = a / b) %>%
gather(group, freq, -year) %>%
arrange(year, group)
# year group freq
#1 1990 a 100.0000000
#2 1990 b 120.0000000
#3 1990 c 0.8333333
#4 2000 a 130.0000000
#5 2000 b 170.0000000
#6 2000 c 0.7647059
说明:我们将spread
数据从长到宽,在重新排序行以重现预期输出之前,先添加一列c = a / b
和gather
数据从宽到长。
答案 1 :(得分:0)
使用功能split
按年份拆分原稿(结果是列表)。
foo <- split(df, df$year)
对于列表foo
中的每个条目,将原始条目x
与计算出freq
的新data.frame绑定起来。
bar <- lapply(foo, function(x)
rbind(x, data.frame(group = "c",
year = x$year[1],
freq = x$freq[1] / x$freq[2])))
# Bind back final result as it's a list (lapply result)
do.call(rbind, bar)
答案 2 :(得分:0)
这里是使用data.table
的选项。将“ data.frame”转换为“ data.table”(setDT(df)
),按“ year”分组,将“ group”与“ c”以及“ freq”以“ freq”元素的比例对应连接
library(data.table)
setDT(df)[, .(group = c(group, 'c'), freq = c(freq, freq[1]/freq[2])), .(year)]
# year group freq
#1: 1990 a 100.0000000
#2: 1990 b 120.0000000
#3: 1990 c 0.8333333
#4: 2000 a 130.0000000
#5: 2000 b 170.0000000
#6: 2000 c 0.7647059
或rbind
和原始数据集一起汇总
rbind(setDT(df), df[, .(freq = Reduce(`/`, freq), group = 'c'), .(year)])
或使用tidyverse
library(tidyverse)
df %>%
group_by(year) %>%
summarise(group = list(c(group, 'c')),
freq = list(c(freq, freq[1]/freq[2]))) %>%
unnest
# A tibble: 6 x 3
# year group freq
# <dbl> <chr> <dbl>
#1 1990 a 100
#2 1990 b 120
#3 1990 c 0.833
#4 2000 a 130
#5 2000 b 170
#6 2000 c 0.765
df <- structure(list(group = c("a", "b", "a", "b"), year = c(1990,
1990, 2000, 2000), freq = c(100, 120, 130, 170)), row.names = c(NA,
-4L), class = "data.frame")