希望能得到一些帮助 我有一个数据框:
df<- data.frame(gem = c(Ruby, Opal, Topaz, Ruby, Ruby,Opal),
cut = c(2,3,4,5,6,2))
现在我打算做的功能是首先获取子集,即gem是Ruby的位置,然后从该子集中获取切割的平均值。
我尝试过使用以下内容:
abc <- function(x,column1,val,coulmn2){
x%>%
subset(column1 %in% val)%>%
mean(na.omit(column2))}
abc(df,gem,"Ruby",cut)
这不起作用,但在上面的例子中,理想情况下答案应该是4.3
答案 0 :(得分:3)
所以你甚至不必写一个函数,有很多方法可以做到这一点,例如:
> aggregate(cut~gem, data=df, mean, na.rm=T)
gem cut
1 Opal 2.500000
2 Ruby 4.333333
3 Topaz 4.000000
或者
> tapply(df$cut, df$gem, mean, na.rm=T)
Opal Ruby Topaz
2.500000 4.333333 4.000000
如果你真的想写一个只给出一个值的函数,那么base
包就是:
> abc<- function(df, column1, val, column2){
+ mean(df[which(df[,column1] == val), column2], na.rm=T)
+ }
> abc(df, "gem", "Ruby", "cut")
[1] 4.333333
答案 1 :(得分:2)
使用dplyr软件包很容易:
library(dplyr)
df<- data.frame(gem = c("Ruby", "Opal", "Topaz", "Ruby", "Ruby","Opal"),
cut = c(2,3,4,5,6,2))
df %>% group_by(gem) %>% summarize(mean(cut))
输出:
# A tibble: 3 × 2
gem `mean(cut)`
<fctr> <dbl>
1 Opal 2.500000
2 Ruby 4.333333
3 Topaz 4.000000
答案 2 :(得分:0)
abc <- function(x,column1,val,column2){
x[x[,column1] %in% val, column2] %>%
na.exclude %>%
mean
}
abc(df,"gem","Ruby","cut")
答案 3 :(得分:0)
我们可以使用data.table
library(data.table)
setDT(df)[, .(cut = mean(cut)), by = gem]
# gem cut
#1: Ruby 4.333333
#2: Opal 2.500000
#3: Topaz 4.000000
df<- data.frame(gem = c("Ruby", "Opal", "Topaz", "Ruby", "Ruby","Opal"),
cut = c(2,3,4,5,6,2))