如何突出显示不在数据中的点?

时间:2019-02-06 12:22:24

标签: r ggplot2

我要突出显示某些点,但这不在原始数据集中。

假设我正在使用此数据集。

library(gcookbook) # To use 'heightweight' dataset
head(heightweight)
  sex ageYear ageMonth heightIn weightLb
1   f   11.92      143     56.3     85.0
2   f   12.92      155     62.3    105.0
3   f   12.75      153     63.3    108.0
4   f   13.42      161     59.0     92.0
5   f   15.92      191     62.5    112.5
6   f   14.25      171     62.5    112.0
ggplot(heightweight, aes(x = ageYear, y = heightIn, color = sex)) + geom_point()

ggplot 到这里为止没有问题,但我想强调每个组(性别)的均值。

female = subset(heightweight, select = c(ageYear, heightIn), subset = (sex == 'f'))
male = subset(heightweight, select = c(ageYear, heightIn), subset = (sex == 'm'))
female_a = mean(female[,1]); female_a
[1] 13.70063
female_h = mean(female[,2]); female_h
[1] 60.52613
male_a = mean(male[,1]); male_a
[1] 13.64752
male_h = mean(male[,2]); male_h
[1] 62.06

很显然,这些点不在数据集中,但是我想用粗点突出显示原始ggplot中的这些点。

有什么想法吗?

3 个答案:

答案 0 :(得分:1)

一种方法是在数据集中进行预先计算:

heightweight <- heightweight %>%
  group_by(sex) %>%
  mutate(
    ageyear = mean(ageYear),
    heightin = mean(heightIn)
  ) %>% ungroup()

剧情:

ggplot(heightweight, aes(x = ageYear, y = heightIn, color = sex)) + 
  geom_point() + 
  geom_point(aes(x = ageyear, y = heightin), size = 5)

这可能是管道的一部分,例如:

heightweight %>%
  group_by(sex) %>%
  mutate(
    ageyear = mean(ageYear),
    heightin = mean(heightIn)
  ) %>% ungroup() %>%
  ggplot(aes(x = ageYear, y = heightIn, color = sex)) + 
  geom_point() + 
  geom_point(aes(x = ageyear, y = heightin), size = 5)

此方法的优势在于,它节省了一些编码时间/空间,并且您无需转换(例如,从baseggplot2),并且颜色自动与其他点(由sex分隔)。

我只是增加了您希望看到的mean点的大小。当然,可能会有进一步的调整,具体取决于您希望如何绘制数据。

答案 1 :(得分:0)

您可以这样做:

 plot_missing_mean_value =    ggplot(heightweight, aes(x = ageYear, y = heightIn, color = sex)) + geom_point()+
      geom_point(aes(female_a, female_h,size = 5), colour="blue")+
      geom_point(aes(male_a, male_h,size = 5), colour="green")

答案 2 :(得分:0)

类似于@ arg0naut,但避免了一系列重叠的意思:

RELATION_ID
SOURCE_1
ACCOUNT_1
ENTITY_ID_1 (ENTITY_ID (from ACCOUNTS table) related to SOURCE_1 and ACCOUNT_1)
SOURCE_2
ACCOUNT_2
ENTITY_ID_2 (ENTITY_ID (from ACCOUNTS table) related to SOURCE_2 and ACCOUNT_2)

enter image description here