如何找到每个分类变量的连续变量的均值?

时间:2018-04-01 20:19:41

标签: r mean categorical-data continuous na.rm

我正在尝试计算与其相关的每个分类形状的UFO瞄准(连续)的平均持续时间。基本上,每个UFO形状的平均瞄准长度是多少?

我试过了:

    a <- aggregate(duration..seconds. ~ shape, data=alien, FUN=mean, na.rm=TRUE)
    barplot(a$duration..seconds., names.arg=a$shape)

得到了:

    no non-missing arguments to min; returning Infno non-missing arguments to max; 
    returning -InfError in plot.window(xlim, ylim, log = log, ...) : need finite 'ylim' values

我意识到我需要以某种方式改变我的数据。我想简单地删除所有缺少相应数据的数据(即,我们知道形状但缺少持续时间 - 反之亦然),但我不太清楚如何做到这一点。

感谢您的帮助!

PS。 “duration..seconds。”是正确的,就是它从excel文件传输的方式。

    shape       duration..seconds.
    us  changing    3600    NA  4/27/2004   29.8830556  
    us  changing    300     NA  12/16/2005  29.38421    
    us  changing    3600    NA  1/21/2008   53.2    
    us  changing    900     NA  1/17/2004   28.9783333  
    ca  changing    1200    NA  1/22/2004   21.4180556  
    us  changing    3600    NA  4/27/2007   36.595  

有8万条不明飞行物的目击记录,这就是我试图平均它的原因。并且有29种不同的形状。

1 个答案:

答案 0 :(得分:0)

数据

df <- read.table(text="
country shape  duration_seconds dummy1 date dummy2
us  changing    3600    NA  4/27/2004   29.8830556  
us  changing    300     NA  12/16/2005  29.38421    
us  changing    3600    NA  1/21/2008   53.2    
us  changing    900     NA  1/17/2004   28.9783333  
ca  changing    1200    NA  1/22/2004   21.4180556  
us  changing    3600    NA  4/27/2007   36.595  
", header = TRUE, stringsAsFactors = FALSE)

您可以使用

修复列标题
names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")

使用库dplyr

library(dplyr)
df %>% 
  group_by(shape)  %>%
  summarize(mean_duration_seconds = mean(duration_seconds))

#   shape    mean_duration_seconds
#   <chr>                    <dbl>
# 1 changing                 2200.

并使用原始代码

names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
a <- aggregate(duration_seconds ~ shape, data=df, FUN=mean, na.rm=TRUE)
barplot(a$duration_seconds, names.arg=a$shape)

a
#   shape    duration_seconds
# 1 changing             2200