我想知道如何有效地(使用短R代码)填充曲线,其中的点可以填满我曲线下的区域?
data = rnorm(1000) ## random data points to fill the curve
curve(dnorm(x), -4, 4) ## curve to be filled by "data" above
points(data) ## plotting the points to fill the curve
答案 0 :(得分:3)
这是一种使用插值来确保绘制点不会超过曲线高度的方法(但是,如果您希望实际点标记不会超出曲线,则需要设置阈值略低于曲线的高度):
# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
回应@ d.b的评论,这是一种在曲线下绘制精确所需点数的方法:
首先,让我们弄清楚我们需要在整个绘图区域生成多少个随机点,以便(大致)获得曲线下的目标点数。我们这样做如下:
我们需要生成的随机点数是目标点数除以上面计算的面积比。
# Area ratio
aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))
# Target number of points under curve
n.target = 1000
# Number of random points to generate
n = ceiling(n.target/aa)
但是我们需要更多的积分来确保我们得到至少n.target
,因为一旦我们将绘制的点数限制为低于{1}},随机变化将导致少于n.target
点的一半时间曲线。因此,为了在曲线下生成比我们需要的更多点,我们将添加excess.factor
,然后我们将随机选择这些点的n.target
进行绘制。这是一个功能,负责一般曲线的整个过程。
# Plot a specified number of points under a curve
pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {
# Area under curve as fraction of area of plot region
aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))
# Number of random points to generate
n = excess.factor*ceiling(n.target/aa)
# Generate n random points in x-range of the data and with y value between
# zero and the maximum y-value of the curve
pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]
# Randomly select only n.target points
pts = pts[sample(1:nrow(pts), n.target), ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
}
让我们运行原始曲线的函数:
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
pts.under.curve(c.pts)
现在让我们用不同的发行版来测试它:
# Curve to be filled
c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001))
pts.under.curve(c.pts, n.target=200)
答案 1 :(得分:1)
n_points = 10000 #A large number
#Store curve in a variable and plot
cc = curve(dnorm(x), -4, 4, n = n_points)
#Generate 1000 random points
p = data.frame(x = seq(-4,4,length.out = n_points), y = rnorm(n = n_points))
#OR p = data.frame(x = runif(n_points,-4,4), y = rnorm(n = n_points))
#Find out the index of values in cc$x closest to p$x
p$ind = findInterval(p$x, cc$x)
#Only retain those points within the curve whose p$y are smaller than cc$y
p2 = p[p$y >= 0 & p$y < cc$y[p$ind],] #may need p[p$y < 0.90 * cc$y[p$ind],] or something
#Plot points
points(p2$x, p2$y)