找出共有一个共同等级的两个等级的值比例

时间:2018-09-01 13:56:48

标签: r

我有一个看起来像这样的数据框:

group <- c('a', 'b', 'a', 'b')
year <- c(1990, 1990, 2000, 2000)
freq <- c(100, 120, 130, 170)
df <- data.frame(group, year, freq)

对于每个不同的年份,我想找到a组的行的频率值除以b组的行的频率值,然后将这些比例值添加到数据框中。结果数据框应如下所示:

group <- c('a', 'b', 'c', 'a', 'b', 'c')
year <- c(1990, 1990, 1990, 2000, 2000, 2000)
freq <- c(100, 120, 100/120, 130, 170, 130/170)
df <- data.frame(group, year, freq)

我试图通过下面最糟糕的循环来解决这个问题,但使火车脱离了轨道。如果有人可以帮助我展示如何在R中完成此基本任务,我将不胜感激!

for (year in unique(df$year)) {
  a = df[ which(df$group == 'a' & df$year == year), ]
  b = df[ which(df$group == 'b' & df$year == year), ]
  proportion = a$freq / b$freq
  row = c('c', year, proportion)
  rbind(df, row)
}

3 个答案:

答案 0 :(得分:3)

这是一个tidyverse选项

library(tidyverse)
df %>%
    spread(group, freq) %>%
    mutate(c = a / b) %>%
    gather(group, freq, -year) %>%
    arrange(year, group)
#  year group        freq
#1 1990     a 100.0000000
#2 1990     b 120.0000000
#3 1990     c   0.8333333
#4 2000     a 130.0000000
#5 2000     b 170.0000000
#6 2000     c   0.7647059

说明:我们将spread数据从长到宽,在重新排序行以重现预期输出之前,先添加一列c = a / bgather数据从宽到长。

答案 1 :(得分:0)

使用功能split按年份拆分原稿(结果是列表)。

foo <- split(df, df$year)

对于列表foo中的每个条目,将原始条目x与计算出freq的新data.frame绑定起来。

bar <- lapply(foo, function(x)
              rbind(x, data.frame(group = "c", 
                                  year = x$year[1], 
                                  freq = x$freq[1] / x$freq[2])))

# Bind back final result as it's a list (lapply result)
do.call(rbind, bar)

答案 2 :(得分:0)

这里是使用data.table的选项。将“ data.frame”转换为“ data.table”(setDT(df)),按“ year”分组,将“ group”与“ c”以及“ freq”以“ freq”元素的比例对应连接

library(data.table)
setDT(df)[, .(group = c(group, 'c'), freq = c(freq, freq[1]/freq[2])), .(year)]
#   year group        freq
#1: 1990     a 100.0000000
#2: 1990     b 120.0000000
#3: 1990     c   0.8333333
#4: 2000     a 130.0000000
#5: 2000     b 170.0000000
#6: 2000     c   0.7647059

rbind和原始数据集一起汇总

rbind(setDT(df), df[, .(freq = Reduce(`/`, freq), group = 'c'), .(year)])

或使用tidyverse

library(tidyverse)
df %>% 
   group_by(year) %>% 
   summarise(group = list(c(group, 'c')), 
            freq = list(c(freq, freq[1]/freq[2]))) %>% 
   unnest
# A tibble: 6 x 3
#   year group    freq
#  <dbl> <chr>   <dbl>
#1  1990 a     100    
#2  1990 b     120    
#3  1990 c       0.833
#4  2000 a     130    
#5  2000 b     170    
#6  2000 c       0.765

数据

df <- structure(list(group = c("a", "b", "a", "b"), year = c(1990, 
1990, 2000, 2000), freq = c(100, 120, 130, 170)), row.names = c(NA, 
-4L), class = "data.frame")