I have one data set that I wish to calibrate to another. What makes this tricky is I do not have the full distribution, I have count data. I know that the distribution is supposed to be Weibull, so I created 2 Weibull curves, but I am then uncertain of what to do to scale one to the other.
The data set I want to calibrate looks like this: I have the bins, the counts in each bin, and an understanding of what the bins pertain to in the real-world (x). The calibrated data is a laser and I'm wanting to see how the same events measured in an acoustic signal match to the laser.
calibrated.data <- data.frame(bins=1:30,
counts=c(39317, 127633, 168713, 169734, 136713, 92202, 66831,
52198, 57662, 25492, 13085, 6824, 3174, 1789, 247,
36, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0),
x=c(0.312, 0.437, 0.562, 0.687, 0.812, 0.937, 1.062, 1.187,
1.375, 1.625, 1.875, 2.125, 2.375, 2.75, 3.25, 3.75, 4.25,
4.75, 5.5, 6.5, 7.5, 8.5, 9.5, 11, 13, 15, 17, 19, 21.5,
24.5))
The data I want to calibrate this to only has counts. The bins do not map 1:1.
new.data <- data.frame(bins=c(1:32,
counts=c(13877, 12874, 12217, 11555, 10834, 10220, 9274, 8381, 7414,
6777, 6069, 5272, 4762, 4077, 3607, 3059, 2672, 2173, 1856,
1475, 1139, 815, 596, 417, 264, 186, 103, 62, 40, 20, 8, 15))
Knowing that ultimately the data follow a Weibull, I created the curve of the Weibull doing
weib.curve <- function(x, shape, scale, c){
c * ((x/shape)^(scale-1)) * exp(-(x/shape)^scale)
}
Followed by a parameterization (I just arbitrarily like nls fitting).
weib.coef.A <- with(calibrated.data,
coef(nls(counts ~ weib.curve(bins, shape, scale,c),
start=list(shape=3.9, scale=-1.5, c=sum(counts)),
control=list(maxiter=100))))
weib.coef.new <- with(new.data,
coef(nls(counts ~ weib.curve(1:32, shape, scale,c),
start=list(shape=3.9, scale=-1.5, c=sum(counts)),
control=list(maxiter=100))))
So now I have the coefficients for the 2 weibull curves, but I don't know how to scale the new.data curve to match the calibrated data so that I can map the new data bins to a real-world x.
If I were working with the raw data I know I could do something along the lines of Altering distribution of one dataset to match another dataset But I'm at a loss how to do that with curves of distributions.
What I ultimately want is two-fold:
new.data is essentially a flattened-out version of calibrated.data. I can see in the figure above that I need to increase the amplitude of the new.data$counts while condensing the new.data$bins to overlay the curves.