创建函数以对数据帧进行子集化,然后取r中特定列的平均值

时间:2017-05-11 22:50:00

标签: r functional-programming subset pipeline

希望能得到一些帮助 我有一个数据框:

df<- data.frame(gem = c(Ruby, Opal, Topaz, Ruby, Ruby,Opal),
                cut = c(2,3,4,5,6,2))

现在我打算做的功能是首先获取子集,即gem是Ruby的位置,然后从该子集中获取切割的平均值。

我尝试过使用以下内容:

abc <- function(x,column1,val,coulmn2){
x%>%
subset(column1 %in% val)%>%
mean(na.omit(column2))}
abc(df,gem,"Ruby",cut)

这不起作用,但在上面的例子中,理想情况下答案应该是4.3

4 个答案:

答案 0 :(得分:3)

所以你甚至不必写一个函数,有很多方法可以做到这一点,例如:

> aggregate(cut~gem, data=df, mean, na.rm=T)
    gem      cut
1  Opal 2.500000
2  Ruby 4.333333
3 Topaz 4.000000

或者

> tapply(df$cut, df$gem, mean, na.rm=T)
    Opal     Ruby    Topaz 
2.500000 4.333333 4.000000 

如果你真的想写一个只给出一个值的函数,那么base包就是:

> abc<- function(df, column1, val, column2){
+   mean(df[which(df[,column1] == val), column2], na.rm=T)
+   }
> abc(df, "gem", "Ruby", "cut")
[1] 4.333333

答案 1 :(得分:2)

使用dplyr软件包很容易:

library(dplyr)

df<- data.frame(gem = c("Ruby", "Opal", "Topaz", "Ruby", "Ruby","Opal"),
                cut = c(2,3,4,5,6,2))
df %>% group_by(gem) %>% summarize(mean(cut))

输出:

# A tibble: 3 × 2
     gem `mean(cut)`
  <fctr>       <dbl>
1   Opal    2.500000
2   Ruby    4.333333
3  Topaz    4.000000

答案 2 :(得分:0)

abc <- function(x,column1,val,column2){
           x[x[,column1] %in% val, column2] %>%
           na.exclude %>%  
           mean
        }
abc(df,"gem","Ruby","cut")

答案 3 :(得分:0)

我们可以使用data.table

library(data.table)
setDT(df)[, .(cut = mean(cut)), by = gem]
#    gem      cut
#1:  Ruby 4.333333
#2:  Opal 2.500000
#3: Topaz 4.000000

数据

df<- data.frame(gem = c("Ruby", "Opal", "Topaz", "Ruby", "Ruby","Opal"),
            cut = c(2,3,4,5,6,2))