Question

进入R课堂的第三周（即使看起来很明显哪里出了问题，也请耐心等待我），我在使用R ggplot2库时遇到家庭作业问题。使用内置的钻石数据框，问题在于为散点图（克拉）和对数图（价格）绘制散点图回归线，但仅绘制公平切割钻石和理想切割钻石。

This is what the plot is supposed to look like

快速了解一下，这里要讨论的3个变量是克拉（数量），切工（公平，良好，非常好，溢价，理想）和价格（整数）。

我从以下代码开始：

set.seed(123) 
d <- ggplot(diamonds[sample(nrow(diamonds),5000),] #this was provided to us in the homework

d + geom_point(aes(x = log(carat), y = log(price), colour = cut) +
  labs(title = 'Regression line for Fair and Ideal Cut Diamonds') +
  stat_smooth(aes(x = log(carat), y = log(price), colour = cut), method = "gam")

Here's what I got

现在，我知道这是不正确的，因为“ colour = cut”显示所有剪切，但是我只希望“ Fair”和“ Ideal”。教授暗示我们应该尝试diamonds $ cut％in％c（...），所以我以许多不同的方式尝试了。最新的（错误的）代码之一是：

d + geom_point(aes(x = log(carat), y = log(price), colour = diamonds[diamonds$cut%in%c("Fair","Ideal")]), alpha = 0.5) +
 labs(title = 'Regression line for Fair and Ideal Cut Diamonds') +
 stat_smooth(aes(x = log(carat), y = log(price), colour = diamonds[diamonds$cut%in%c("Fair","Ideal")]), method = "gam")

无论我在哪里尝试对Diamonds $ cut进行子集化，我都会不断收到错误消息（例如，“ [”的逻辑索引向量的长度必须等于列数，美学值的长度必须等于1或与数据相同（5000）：彩色）。

如何仅提取“公平”和“理想”切割来制作此图？

感谢您的帮助！

Answer 1

这是在data参数中声明ggplot2参数之前ggplot的方式，尽管我不确定如何过滤cut在aes(colour = cut)中指定为映射变量的列。尽管此时仍不重要，但根据您的帖子，情节并没有完全按照应有的方式出现。希望这会有所帮助。

library(ggplot2)

set.seed(123)
z <- diamonds[sample(nrow(diamonds),5000),]
z <- z[z$cut %in% c("Fair", "Ideal"),]

d <- ggplot(data = z) +
  geom_point(aes(x = log(carat), y = log(price), colour = cut), alpha = 0.5) +
  labs(title = 'Regression line for Fair and Ideal Cut Diamonds') +
  stat_smooth(aes(x = log(carat), y = log(price), colour = cut), method = "gam")
d

^{由reprex package（v0.2.1）于2019-03-21创建}

Answer 2

使用subset（）子集数据。一种修改是完全按照您的图形将stat_smooth中的方法更改为“ auto”的方式进行，因此该行将跟随数据点。该图表不能总是和我们进行随机抽样时一样。

library(ggplot2)

df<-diamonds[sample(nrow(diamonds),50000),]

subset(df,cut%in%c("Fair","Ideal"))->df_fair_ideal

ggplot(df_fair_ideal,aes(x=log(carat),y=log(price),color=cut),alpha=0.5)+
  labs(title = 'Regression line for Fair and Ideal Cut Diamonds') +
  geom_point()+xlim(min(log(df_fair_ideal$carat)),max(log(df_fair_ideal$carat)))+
  stat_smooth(method = "auto",se=T)

R与ggplot2：在进行散点图绘制时保留数据帧的某些行

2 个答案: