避免在ggplot2中重复数据

时间:2020-10-19 17:23:53

标签: r ggplot2

希望你能帮助我

我想绘制每年的出版物数量(并按学科分类)。

如何在ggplot2中条形图而不复制数据?

如何为每个ID(x)绘制单个值?

我无法删除行,因为我的DF有其他列,其他图的数据需要像这样。

非常感谢您。

structure(list(x = c(1240L, 1251L, 1214L, 1222L, 1234L, 1235L, 
1183L, 1197L, 1198L, 1162L, 1167L, 1169L, 1170L, 1171L, 1176L, 
1104L, 1104L, 1113L, 1117L, 1119L, 1119L, 1063L, 1064L, 1065L, 
1066L, 1072L, 1081L), year = c(1997L, 1997L, 1998L, 1998L, 1998L, 
1998L, 1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L, 
2000L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L), discipline = structure(c(11L, 2L, 
7L, 2L, 2L, 2L, 7L, 7L, 7L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 2L, 
4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "Biogeochemistry", 
"Conservation", "Ecology", "Environmental sciences (interdisciplines)", 
"Geochemical", "Geochemistry", "Geography", "Limnology", "Management", 
"Oceanography", "Socioecology"), class = "factor"), es.type = c("no", 
"no", "no", "Supporting", "no", "no", "no", "no", "no", "no", 
"Regulating", "no", "no", "Supporting", "Supporting", "Supporting", 
"Regulating", "Supporting", "Supporting", "Supporting", "Regulating", 
"no", "no", "no", "Supporting", "Supporting", "Supporting")), row.names = c(NA, 
-27L), class = "data.frame")

例如,在该图中,重复了2002年的生态数据。 Plot

问题2:

如果我想删除重复的数据但考虑两列怎么办?例如:

ID = c(1,1,1,1,2,2,3,4,5,5,5,5,6)
Year = c(1990, 1990, 1990, 1990, 1994, 1994,1994, 1995,1995, 1995,1995,1995,1996)
Discipline <- c("Ecology","Ecology","Oceanography", "Oceanography","Oceanography","Oceanography","Oceanography","Oceanography","Oceanography",
                                 "Oceanography","Oceanography","Microbiology","Ecology")
df <-data.frame(ID, Year, Discipline)

 #Build plot
p<-ggplot(data=df, aes(x=factor(Year), fill = Discipline)) + geom_bar()
p

在这种情况下,我想绘制ID1中的两个数据=生态学和海洋学。我的意思是我想删除df $ x中重复的学科。对于ID1,我要删除1行生态学和1行海洋学。 在这种情况下我该怎么办?

1 个答案:

答案 0 :(得分:0)

您可能正在寻找这样的东西:

#Define data:
df = structure(list(x = c(1240L, 1251L, 1214L, 1222L, 1234L, 1235L, 
                     1183L, 1197L, 1198L, 1162L, 1167L, 1169L, 
                     1170L, 1171L, 1176L, 
                     1104L, 1104L, 1113L, 1117L, 1119L, 1119L, 1063L, 1064L, 
                     1065L, 
                     1066L, 1072L, 1081L), 
               year = c(1997L, 1997L, 1998L, 1998L, 1998L, 
                        1998L, 1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L, 
                        2000L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 
                        2003L, 2003L, 2003L, 2003L), 
               discipline = structure(c(11L, 2L, 7L, 2L, 2L, 2L, 7L, 7L, 7L, 2L,
                                        2L, 2L, 2L, 2L, 4L, 4L, 4L, 2L, 
                                          4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L), 
              .Label = c("", "Biogeochemistry", 
                  "Conservation", "Ecology", "Environmental sciences (interdisciplines)", 
                  "Geochemical", "Geochemistry", "Geography", "Limnology", "Management", 
                  "Oceanography", "Socioecology"), class = "factor"), 
              es.type = c("no", "no", "no", "Supporting", "no", "no", "no", "no", "no", "no", "Regulating", "no", "no", "Supporting", "Supporting", "Supporting", 
"Regulating", "Supporting", "Supporting", "Supporting", "Regulating", 
"no", "no", "no", "Supporting", "Supporting", "Supporting")),row.names = c(NA, 
                     -27L), class = "data.frame")   


#Build plot:
p<-ggplot(data=df[!duplicated(df$x),] , aes(x=factor(year), fill = discipline)) +
  geom_bar(position = position_dodge())
p

最重要的部分是df[!duplicated(df$x),],它仅给您df的行,其中x列中的值是唯一的。

关于第二个问题,您可以执行以下操作:

p<-ggplot(data=df[!duplicated(df[,c("ID", "Discipline")]),], aes(x=factor(Year), 
          fill = Discipline)) + 
  geom_bar(position = position_dodge())
p

有效地,这会在所需的列上调用duplicated