Question

尝试开发一个灵活的脚本，将连续变量观察'得分'的平均值绘制为数据框中离散时间点'天'的函数。

我可以通过创建子集来实现这一点，但是我有一大组数据，其中包含许多因子矢量，例如“day”，因此我希望获得每个因子的向量或数据框及其相应的均值。

我的数据框结构如下：

subject day score
      1   0 99.13
      2   0    NA
      3   0 86.87
      1   7 73.71
      2   7 82.42
      3   7 84.45
      1  14 66.88
      2  14 83.73
      3  14    NA

我尝试了tapply（），但无法将其输出到具有适当标头的向量或表格，并且还可以处理NA。

寻找一个简单的代码来获得两个向量或一个数据框，用它来绘制'得分'的平均值作为因子'天'的函数。

因此，该图将在每天0,7和14的平均分上得分。

我已经看过很多关于这个directly with ggplot的帖子，但知道怎么做似乎很有用，我需要查看输出以确保它正确处理NAs。

如果您能够提供帮助，请在脚本中加入解释性注释。谢谢！的

Answer 1

我认为tapply应该能够处理这个，你可以修改删除NAs的功能：

df=data.frame("subject"=rep(1:3,3), "day"=as.factor(rep(c(0,7,14),each=3)),
              "score"=c(99.13,NA,86.87,73.71,82.42,84.45,66.88,83.73,NA))

res = with(df, tapply(score, day, function(x) mean(x,na.rm=T)))

编辑获取日期并以矢量分数

day=as.numeric(names(res))
day
0  7 14

score=as.numeric(res)
score
93.00000 80.19333 75.30500

在基础R中绘图：

plot(x=as.numeric(as.character(df$day)),y=df$score,type="p")
lines(x=names(res),y=res, col="red")

Answer 2

不完全清楚你想要实现的目标。在这里，我将展示如何使用ggplot2包创建一个带有每组平均值的点图。假设dt是您的数据框。

library(ggplot2)
ggplot(dt, aes(x = day, y = score, color = factor(subject))) + # Specify x, y and color information
  geom_point(size = 3) +                                       # plot the point and specify the size is 3
  scale_color_brewer(name = "Subject", 
                     type = "qual", 
                     palette = "Pastel1") +                    # Format the color of points and the legend using ColorBrewer
  scale_x_continuous(breaks = c(0, 7, 14)) +                   # Set the breaks on x-axis
  stat_summary(fun.y = "mean", 
               color = "red", 
               geom = "point", 
               size = 5, 
               shape = 8) +                                    # Compute mean of each group and plot it
  theme_classic()                                              # Specify the theme

警告消息：1：删除了包含非有限值的2行（stat_summary）。 2：删除了包含缺失值的2行（geom_point）。

如果您运行上面的代码，您将收到如下警告消息和图表。警告消息表示NA已被删除，因此您无需从数据集中进一步删除NA。

数据

dt <- read.table(text = "subject day score 1 0 99.13 2 0 NA 3 0 86.87 1 7 73.71 2 7 82.42 3 7 84.45 1 14 66.88 2 14 83.73 3 14 NA", header = TRUE, stringsAsFactors = FALSE)

从数据框

2 个答案:

编辑获取日期并以矢量分数