Question

我有一个具有以下结构的数据集：

Features Method Distance V1 V2 ........  V100
  V1V2     LOF     A      4  5  .........  6
   .
   .
   .
V1V2V3V4V5 Gaussian C     7  8   .........  7

数据集由624行和103列组成。三个第一列对应于每行的信息，其余列从V1到V100与数据对应。

我需要创建一个多时隙26x8条形图，显示平均值和平均值的标准误差。我添加一个函数来计算均值的标准误差。

#function for standard error of the mean
sem <- function(x){
 sd(x)/sqrt(length(x))
 }

每个条形图应显示从V1到V100的平均值以及每个距离A，B，C的平均值的标准误差。

下面提供了数据集的示例

df <- read.table(text=" Features      Method Distance        V1        V2        V3        V4        V5        V6        V7
   V1V2         LOF        A 11.764706  3.703704 15.384615  9.090909  9.090909  8.000000  7.407407
V1V2 Mahalanobis        A 11.764706 33.333333 15.384615  9.090909  9.090909 28.571429 33.333333
  V1V2        Cook        A 40.540541  6.666667 24.390244 24.358974 32.608696 15.584416 17.647059
  V1V2      DIFFTS        A 24.590164  4.958678 28.169014 26.950355 30.588235 47.058824 10.909091
  V1V2       OCSVM        A 36.585366 25.000000 57.142857 35.514019 88.372093  8.988764  5.825243
  V1V2      DBSCAN        A 44.117647 21.428571 30.769231 51.351351 41.269841 14.814815  6.976744
  V1V2         PCA        A 11.764706 33.333333 15.384615  9.090909  9.090909 28.571429 33.333333
  V1V2    Gaussian        A  1.886792  3.278689  1.869159  1.398601  2.597403  2.197802  4.878049
  V1V3         LOF        A 12.698413 20.000000 55.000000  6.666667 33.333333 29.787234  2.777778
 V1V3 Mahalanobis        A 11.764706 33.333333 15.384615  9.090909  9.090909 28.571429 33.333333",header=T)

情节的一个例子应该是这样但是平均值和标准误差。

劳尔

Answer 1

你走了。我选择在绘图之前汇总数据，因为我更喜欢控制这样的事情。你也可以在ggplot2中使用内置的stat_summary。

library(ggplot2)
library(dplyr)
library(reshape2)

#first, reshape (just like in your previous Q)

df_m <- melt(df,id.vars=c("Features","Method","Distance"))

#now aggregate
sem <- function(x){
  sd(x)/sqrt(length(x))
}

df_a <- df_m %>% group_by(Features,Method,Distance) %>% summarise(
  mean_value=mean(value),
  sem_value=sem(value)
)

#now plotting is easy
#using bars
p1 <- ggplot(df_a, aes(x=Distance))+
  facet_grid(Features~Method)+
  geom_bar(aes(y=mean_value),stat="identity")+
  geom_errorbar(aes(ymin=mean_value-sem_value,ymax=mean_value+sem_value))
p1

#using point（我的偏好）

p2 <- ggplot(df_a, aes(x=Distance))+
  facet_grid(Features~Method)+
  geom_point(aes(y=mean_value),size=2)+
  geom_errorbar(aes(ymin=mean_value-sem_value,ymax=mean_value+sem_value))
p2

数据可视化多个条形图，平均值和误差在R中

1 个答案: