Question

我对C ++和RCPP集成还很陌生。我需要使用具有R集成的C ++创建一个程序，以查找Cauchy分布的MLE /根。

到目前为止，以下是我的代码。

#include <Rcpp.h> 
#include <math.h>
#include <iostream>
#include <cstdlib>
using namespace std;
using namespace Rcpp;


// [[Rcpp::export]]
double Cauchy(double x, double y);    //Declare Function
double Cauchy(double x,double y)    //Define Function
{

return    1/(M_PI*(1+(pow(x-y,2))));    //write the equation whose roots are 
to be determined x=chosen y=theta

}


using namespace std;

// [[Rcpp::export]]


int Secant (NumericVector x){
NumericVector xvector(x) ; //Input of x vector
double eplison= 0.001; //Threshold
double a= xvector[3]; //Select starting point
double b= xvector[4];//Select end point
double c= 0.0; //initial value for c
double Theta= 10.6; //median value for theta estimate
int noofIter= 0; //Iterations
double error = 0.0; 

if (std::abs(Cauchy(a, Theta)<(std::abs(Cauchy(a, Theta))))){  


do{

a=b;

b=c; 


error= (b-(Cauchy(b, Theta)))*((a-b)/(Cauchy(a, Theta)-Cauchy(b, Theta)));

error= Cauchy(c,Theta); 

//return number of iterations
noofIter++;


for (int i = 0; i < noofIter; i += 1) {
cout << "The Value is " << c << endl;
cout << "The Value is " << a << endl;
cout << "The Value is " << b << endl;
cout << "The Value is " << Theta << endl;
}


}while (std::abs(error)>eplison);

}


cout<<"\nThe root of the equation is occurs at "<<c<<endl;    //print the 
root
cout << "The number of iterations is " << noofIter;
return 0;
}``

有一些修改，程序要么进入永无止境的循环，要么返回一个无限小的值。

我对这种数学的理解是有限的。因此，任何帮助或纠正都将不胜感激。

作为输出给出的X向量是

x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
         11.731881 , 8.861998 , 12.246509 , 11.244818 ,
         9.696278 , 11.557572 , 11.112531 , 10.550190 ,
         9.018438 , 10.704774 , 9.515617 , 10.003247 ,
         10.278352 , 9.709630 , 10.963905 , 17.314814)

使用先前的R代码，我们知道此分布的MLE /根大约为10.5935

用于获取此MLE的代码为

           optimize(function(theta)-sum(dcauchy(x, location=theta, 
           log=TRUE)),  c(-100,100))

谢谢！

Answer 1

使用optimize()函数可直接搜索可能性的极值。一种替代方法是将根查找算法（例如割线方法）与（log-）可能性的导数一起使用。从Wikipedia中，我们得到了必须解决的公式。在R中，它可能看起来像这样：

x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
        11.731881 , 8.861998 , 12.246509 , 11.244818 ,
        9.696278 , 11.557572 , 11.112531 , 10.550190 ,
        9.018438 , 10.704774 , 9.515617 , 10.003247 ,
        10.278352 , 9.709630 , 10.963905 , 17.314814)

ld <- function(sample, theta){
  xp <- outer(sample, theta, FUN = "-")
  colSums(xp/(1+xp^2))
}
uniroot(ld, sample = x, lower = 0, upper = 20)$root
#> [1] 10.59724

请注意，对数似然的导数在两个参数上均被向量化。这样可以轻松绘制：

theta <- seq(0, 20, length=500)
plot(theta, ld(x, theta), type="l",
     xlab=expression(theta), ylab=expression(ld(x, theta)))

从图中可以看出，找到正确的切线方法开始工作很困难。

让我们将其移至C ++（准确地说是C ++ 11）：

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]

Rcpp::List secant(const std::function<double(double)>& f, 
              double a, double b, int maxIterations, double epsilon) {
  double c(0.0);
  do {
    c = b * (1 - (1 - a/b) / (1 - f(a)/f(b)));
    a = b;
    b = c;
  } while (maxIterations-- > 0 && std::abs(a - b) > epsilon);
  return Rcpp::List::create(Rcpp::Named("root") = c,
                            Rcpp::Named("f.root") = f(c),
                            Rcpp::Named("converged") = (maxIterations > 0));
}

// [[Rcpp::export]]
Rcpp::List mleCauchy(const Rcpp::NumericVector& sample, double a, double b,
                     int maxIterations = 100, double epsilon = 0.0001) {
  auto f = [&sample](double theta) {
    Rcpp::NumericVector xp = sample - theta;
    xp = xp / (1 + xp * xp);
    return Rcpp::sum(xp);
  };
  return secant(f, a, b, maxIterations, epsilon);
}


/*** R
x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
        11.731881 , 8.861998 , 12.246509 , 11.244818 ,
        9.696278 , 11.557572 , 11.112531 , 10.550190 ,
        9.018438 , 10.704774 , 9.515617 , 10.003247 ,
        10.278352 , 9.709630 , 10.963905 , 17.314814)

mleCauchy(x, 11, 15)
#-> does not converge
mleCauchy(x, 11, 14)
#-> 10.59721
mleCauchy(x, mean(x), median(x))
#-> 10.59721
*/

secant()函数适用于以double作为参数并返回double的任何std::function。然后，将这样的函数定义为lambda function，具体取决于提供的样本值。正如预期的那样，只有从接近正确值的值开始才获得正确的根。

Lambda函数乍看之下可能会有些混乱，但是它们与我们在R中使用的功能非常接近。这里使用R编写的相同算法：

secant <- function(f, a, b, maxIterations, epsilon) {
  for (i in seq.int(maxIterations)) {
    c <- b * (1 - (1 - a/b) / (1 - f(a)/f(b)))
    a <- b
    b <- c
    if (abs(a - b) <= epsilon)
      break
  }
  list(root = c, f.root = f(c), converged = (i < maxIterations))
}

mleCauchy <- function(sample, a, b, maxIterations = 100L, epsilon = 0.001) {
  f <- function(theta) {
    xp <- sample - theta
    sum(xp/(1 + xp^2))
  }
  secant(f, a, b, maxIterations, epsilon)
}

x <- c( 11.262307 , 10.281078 , 10.287090 , 12.734039 ,
        11.731881 , 8.861998 , 12.246509 , 11.244818 ,
        9.696278 , 11.557572 , 11.112531 , 10.550190 ,
        9.018438 , 10.704774 , 9.515617 , 10.003247 ,
        10.278352 , 9.709630 , 10.963905 , 17.314814)
mleCauchy(x, 11, 12)
#-> 10.59721

R函数f和lambda函数f从定义它们的环境中获取向量sample。在R中，这是隐式发生的，而在C ++中，我们必须明确地告知应捕获此值。数字theta是调用函数时提供的参数，即以a和b开头的根的连续估计。

柯西分布割线法C ++ RCPP MLE /根

1 个答案: