我正在尝试计算与其相关的每个分类形状的UFO瞄准(连续)的平均持续时间。基本上,每个UFO形状的平均瞄准长度是多少?
我试过了:
a <- aggregate(duration..seconds. ~ shape, data=alien, FUN=mean, na.rm=TRUE)
barplot(a$duration..seconds., names.arg=a$shape)
得到了:
no non-missing arguments to min; returning Infno non-missing arguments to max;
returning -InfError in plot.window(xlim, ylim, log = log, ...) : need finite 'ylim' values
我意识到我需要以某种方式改变我的数据。我想简单地删除所有缺少相应数据的数据(即,我们知道形状但缺少持续时间 - 反之亦然),但我不太清楚如何做到这一点。
感谢您的帮助!
PS。 “duration..seconds。”是正确的,就是它从excel文件传输的方式。
shape duration..seconds.
us changing 3600 NA 4/27/2004 29.8830556
us changing 300 NA 12/16/2005 29.38421
us changing 3600 NA 1/21/2008 53.2
us changing 900 NA 1/17/2004 28.9783333
ca changing 1200 NA 1/22/2004 21.4180556
us changing 3600 NA 4/27/2007 36.595
有8万条不明飞行物的目击记录,这就是我试图平均它的原因。并且有29种不同的形状。
答案 0 :(得分:0)
数据
df <- read.table(text="
country shape duration_seconds dummy1 date dummy2
us changing 3600 NA 4/27/2004 29.8830556
us changing 300 NA 12/16/2005 29.38421
us changing 3600 NA 1/21/2008 53.2
us changing 900 NA 1/17/2004 28.9783333
ca changing 1200 NA 1/22/2004 21.4180556
us changing 3600 NA 4/27/2007 36.595
", header = TRUE, stringsAsFactors = FALSE)
您可以使用
修复列标题names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
使用库dplyr
library(dplyr)
df %>%
group_by(shape) %>%
summarize(mean_duration_seconds = mean(duration_seconds))
# shape mean_duration_seconds
# <chr> <dbl>
# 1 changing 2200.
并使用原始代码
names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
a <- aggregate(duration_seconds ~ shape, data=df, FUN=mean, na.rm=TRUE)
barplot(a$duration_seconds, names.arg=a$shape)
a
# shape duration_seconds
# 1 changing 2200