使用适合R图中曲线下方的点填充曲线

时间:2017-04-13 16:59:34

标签: r plot

我想知道如何有效地(使用短R代码)填充曲线,其中的点可以填满我曲线下的区域?

enter image description here 我尝试了一些没有成功的事情,这是我的R代码:

data = rnorm(1000)     ## random data points to fill the curve

curve(dnorm(x), -4, 4) ## curve to be filled by "data" above

points(data)           ## plotting the points to fill the curve

2 个答案:

答案 0 :(得分:3)

这是一种使用插值来确保绘制点不会超过曲线高度的方法(但是,如果您希望实际点标记不会超出曲线,则需要设置阈值略低于曲线的高度):

# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4)) 

# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))

# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]

# Plot the points
points(pts, pch=16, col="red", cex=0.7)

enter image description here

用于在曲线下精确绘制所需数量的点的方法

回应@ d.b的评论,这是一种在曲线下绘制精确所需点数的方法:

首先,让我们弄清楚我们需要在整个绘图区域生成多少个随机点,以便(大致)获得曲线下的目标点数。我们这样做如下:

  1. 计算曲线下的面积,作为由零限定的矩形区域的一部分,以及垂直轴上曲线的最大高度,以及水平轴上曲线的宽度。
  2. 我们需要生成的随机点数是目标点数除以上面计算的面积比。

    # Area ratio
    aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))
    
    # Target number of points under curve
    n.target = 1000
    
    # Number of random points to generate
    n = ceiling(n.target/aa)
    
  3. 但是我们需要更多的积分来确保我们得到至少n.target,因为一旦我们将绘制的点数限制为低于{1}},随机变化将导致少于n.target点的一半时间曲线。因此,为了在曲线下生成比我们需要的更多点,我们将添加excess.factor,然后我们将随机选择这些点的n.target进行绘制。这是一个功能,负责一般曲线的整个过程。

    # Plot a specified number of points under a curve
    pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {
    
      # Area under curve as fraction of area of plot region
      aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))
    
      # Number of random points to generate
      n = excess.factor*ceiling(n.target/aa)
    
      # Generate n random points in x-range of the data and with y value between
      # zero and the maximum y-value of the curve
      pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))
    
      # Using interpolation, keep only those points whose y-value is less than y(x)
      pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]
    
      # Randomly select only n.target points
      pts = pts[sample(1:nrow(pts), n.target), ]
    
      # Plot the points
      points(pts, pch=16, col="red", cex=0.7)
    
    }
    

    让我们运行原始曲线的函数:

    c.pts = as.data.frame(curve(dnorm(x), -4, 4)) 
    
    pts.under.curve(c.pts)
    

    enter image description here

    现在让我们用不同的发行版来测试它:

    # Curve to be filled
    c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001)) 
    
    pts.under.curve(c.pts, n.target=200)
    

    enter image description here

答案 1 :(得分:1)

n_points = 10000 #A large number

#Store curve in a variable and plot
cc = curve(dnorm(x), -4, 4, n = n_points)

#Generate 1000 random points
p = data.frame(x = seq(-4,4,length.out = n_points), y = rnorm(n = n_points))
#OR p = data.frame(x = runif(n_points,-4,4), y = rnorm(n = n_points))

#Find out the index of values in cc$x closest to p$x
p$ind = findInterval(p$x, cc$x)

#Only retain those points within the curve whose p$y are smaller than cc$y
p2 = p[p$y >= 0 & p$y < cc$y[p$ind],] #may need p[p$y < 0.90 * cc$y[p$ind],] or something

#Plot points
points(p2$x, p2$y)