Question

我在[-1,1] ^ 2区间内生成了一个包含100个随机x-y坐标的矩阵：

n <- 100
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n) 
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates

并通过给定的目标函数f（向量）将它们分为2类-1和1。我计算了一个假设函数g，现在想要想象它与它的匹配程度目标函数f。

f <- c(1.0, 0.5320523, 0.6918301)   # the given target function
ylist <- sign(datam %*% f)    # classify into -1 and 1

# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
  w <- c(1,0,0)             # starting vector
  made.mistake = TRUE 
  while (made.mistake) {
  made.mistake=FALSE 
  for (i in 1:n) {
  if (ylist[i] != sign(t(w) %*% datam[i,])) {
    w <- w + ylist[i]*datam[i,]
    made.mistake=TRUE 
  }
 }
}
return(w=w)
}

g <- perceptron(datam, ylist)

我现在想在情节中比较f和g。

我可以在mathematica中轻松完成这项工作。这里显示的是具有目标函数f的数据集，该函数用于分隔+1和-1部分中的数据：

此mathematica图显示了f和g的比较（不同的数据集和f）

这是相应的mathematica代码

ContourPlot[g.{1, x1, x2} == 0, {x1, -1, 1}, {x2, -1, 1}]

我怎样才能在R中做类似的事情（ggplot会很好）？

Answer 1

使用ggplot同样如此。这个例子完全遵循你的代码，然后在最后添加：

# OP's code...
# ...

glist <- sign(datam %*% g)

library(reshape2)  # for melt(...)
library(plyr)      # for .(...)
library(ggplot2)
df <- data.frame(datam,f=ylist,g=glist) # df has columns: X1, X2, X3, f, g
gg <- melt(df,id.vars=c("X1","X2","X3"),variable.name="model")

ggp <- ggplot(gg, aes(x=X2, y=X3, color=factor(value)))
ggp <- ggp + geom_point()
ggp <- ggp + geom_abline(subset=.(model=="f"),intercept=-f[1]/f[3],slope=-f[2]/f[3])
ggp <- ggp + geom_abline(subset=.(model=="g"),intercept=-g[1]/g[3],slope=-g[2]/g[3])
ggp <- ggp + facet_wrap(~model)
ggp <- ggp + scale_color_discrete(name="Mistake")
ggp <- ggp + labs(title=paste0("Comparison of Target (f) and Hypothesis (g) [n=",n,"]"))
ggp <- ggp + theme(plot.title=element_text(face="bold"))
ggp

以下是n=200, 500, and 1000的结果。当n=100, g=c(1,0,0)时。你可以看到f和g收敛于n~500。

如果您不熟悉ggplot：首先我们创建一个数据框（df），其中包含坐标（X2 and X3）和两列基于{{1的分类}和f。然后，我们使用g将其转换为“长”格式的新数据框melt(...)。 gg列gg。列X1, X2, X3, model, and value标识模型（gg$model）。相应的分类在f or g中。然后ggplot调用执行以下操作：

建立默认数据集gg，x和y坐标以及着色[gg$value]
添加点图层[ggplot(...)]
添加分隔分类[geom_point(...)]
告诉ggplot将两个模型绘制成不同的“方面”[geom_abline(...)]
设置图例名称。
设置图表标题。
将地块标题加粗。

Answer 2

您的示例仍然无法重现。看看我的代码，你会发现f和g是相同的。此外，您似乎正在为您没有的数据点推断线条（问题的第二部分）。你有证据表明歧视应该是线性的吗？

#Data generation
n <- 10000
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n) 
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
datam.df<-data.frame(datam)
datam.df$X1<-NULL
f <- c(1.0, 0.5320523, 0.6918301)   # the given target function
f.col <- ifelse(sign(datam %*% f)==1,"darkred", "darkblue")    
f.fun<-sign(datam %*% f)

# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
  w <- c(1,0,0)             # starting vector
  made.mistake = TRUE 
  while (made.mistake) {
  made.mistake=FALSE 
  for (i in 1:n) {
  if (ylist[i] != sign(t(w) %*% datam[i,])) {
    w <- w + ylist[i]*datam[i,]
    made.mistake=TRUE 
  }
 }
}
return(w=w)
}


g <- perceptron(datam, f.fun)
g.fun<-sign(datam %*% g)

绘制整体数据

plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", cex=2)

enter image description here

我将为g和f函数生成单独的图，因为在您的示例中某些内容不起作用且f和g相同。一旦你解决了这个问题，你就可以将所有内容放在一个情节中。你也可以看到并选择是否需要遮蔽。如果您没有证据表明分类是线性的，那么使用chull()标记您拥有的数据可能更为明智。

对于f函数

plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="f function")
datam.df.f<-datam.df[f.fun==1,]
ch.f<-chull(datam.df.f$X2, datam.df.f$X3 )
ch.f <- rbind(x = datam.df.f[ch.f, ], datam.df.f[ch.f[1], ])
polygon(ch.f, lwd=3, col=rgb(0,0,180,alpha=50, maxColorValue=255))

enter image description here

对于g功能

    g.col <- ifelse(sign(datam %*% g)==1,"darkred", "darkblue")    
    plot(datam.df$X2, datam.df$X3, col=g.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="g function")
    datam.df.g<-datam.df[g.fun==1,]
    ch.g<-chull(datam.df.g$X2, datam.df.g$X3 )
    ch.g <- rbind(x = datam.df.g[ch.g, ], datam.df.g[ch.g[1], ])
    polygon(ch.g, col=rgb(0,0,180,alpha=50, maxColorValue=255), lty=3, lwd=3)

enter image description here

ch.f和ch.g对象是点周围“bag”的坐标。您可以提取点来描述您的线。

ch.f
lm.f<-lm(c(ch.f$X3[ ch.f$X2> -0.99 & ch.f$X2< -0.65 & ch.f$X3<0 ])~c(ch.f$X2[ ch.f$X2>-0.99 & ch.f$X2< -0.65 & ch.f$X3<0]))
curve(lm.f$coefficients[1]+x*lm.f$coefficients[2], from=-1., to=-0.59, lwd=5, add=T)
lm.g<-lm(c(ch.g$X3[ ch.g$X2> -0.99 & ch.g$X2< -0.65 & ch.g$X3<0 ])~c(ch.g$X2[ ch.g$X2>-0.99 & ch.g$X2< -0.65 & ch.g$X3<0]))
curve(lm.g$coefficients[1]+x*lm.g$coefficients[2], from=-1., to=-0.59, lwd=5, add=T, lty=3)

你得到了

enter image description here

不幸的是，因为你的例子中的f和g函数是相同的，所以你看不到上图中的不同行

Answer 3

您可以使用col中的plot()参数来指示f()函数的分类。您可以使用polygon()为g()函数的分类区域着色。如果您给我们一个可重复的示例，我们可以使用特定代码回答。它会产生类似于您呈现的Mathematica的数字。

在R中绘制一个区域

3 个答案: