我正在研究R中的一个程序来计算最多1000个数据点的Gabriel图。我使用了我在网上找到的程序(GabrielGraph based on Bhattacharya et al. 1981第781-830行)。
不幸的是,获得结果需要相当多的时间,所以我尝试使用Rcpp重新编程。为此我写了几个小程序和一个叫做edge的大程序,用来计算Gabriel图的边缘。我也是Rcpp编程的新手,所以我可能做了比必要更复杂的事情,但我不知道如何做得更好。
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double vecnorm(NumericVector x){
//to calculate the vectornorm sqrt(sum of (vector entries)^2)
double out;
out = sqrt(sum(pow(x,2.0)));
return out;
}
// [[Rcpp::export]]
NumericVector vektorzugriff(NumericMatrix xy,int i){
//to return a row of the Matrix xy
int col = xy.ncol();
NumericVector out(col);
for(int j=0; j<=col; j++){
out[j] = xy(i-1,j);
}
return out;
}
// [[Rcpp::export]]
IntegerVector vergl(NumericVector eins, NumericVector zwei){
//to see if two Vectors have any identical entries
IntegerVector out = match(eins, zwei);
return out;
}
// [[Rcpp::export]]
IntegerVector verglInt(int eins, NumericVector zwei){
NumericVector dummy = NumericVector::create( eins ) ;
IntegerVector out = match(dummy, zwei);
return out;
}
// [[Rcpp::export]]
NumericVector toVec(NumericVector excluded, int k){
//to append int k to a Vector excluded
NumericVector dummy = NumericVector::create( k ) ;
int len = excluded.size();
int len2 = dummy.size();
int i=0;
NumericVector out(len+len2);
while(i<len+len2){
if(i<len){
out[i]=excluded[i];
i++;
}
else{
out[i]=dummy[i-len];
i++;
}
}
return out;
}
// [[Rcpp::export]]
LogicalVector isNA(IntegerVector x) {
//to see which Vector Entries are NAs
int n = x.size();
LogicalVector out(n);
for (int i = 0; i < n; ++i) {
out[i] = IntegerVector::is_na(x[i]);
}
return out;
}
// [[Rcpp::export]]
NumericMatrix Gab(NumericMatrix Gabriel, NumericVector edges1, NumericVector edges2, int anz){
//to fill a Matrix with the Gabrieledges
for(int i=0; i<anz; i++) {
Gabriel(edges1[i]-1, edges2[i]-1) = 1 ;
Gabriel(edges2[i]-1, edges1[i]-1) = 1 ;
}
return Gabriel;
}
// [[Rcpp::export]]
NumericVector edges(NumericMatrix xy,NumericVector vertices,NumericVector excluded, int i){
//actual function to calculate the edges of the GabrielGraph
int npts = xy.nrow()+1;
double d1;
double d2;
double d3;
for(int r=i+1; r<npts; r++) {
// Skip vertices in excluded
if(!is_true(any(isNA(verglInt(r,excluded))))){
continue;}
d1 = vecnorm(vektorzugriff(xy,i) - vektorzugriff(xy,r));
for(int k=1; k<npts; k++) {
if((k!=r) && (k!=i)){
d2 = vecnorm(vektorzugriff(xy,i) - vektorzugriff(xy,k));
d3 = vecnorm(vektorzugriff(xy,r) - vektorzugriff(xy,k));
//Betrachte vertices, die noch nicht excluded sind
if(!is_true(any(isNA(verglInt(k,vertices[isNA(vergl(vertices,excluded))]))))){
//Wenn d(x,z)^2 > d(x,y)^2+d(y,z)^2 -> Kante gehoert nicht zum GG
if( pow(d2,2.0) > pow(d1,2.0) + pow(d3,2.0) ) {
excluded = toVec(excluded,k);
}
}
if( pow(d1,2.0) > pow(d2,2.0) + pow(d3,2.0) ){
excluded = toVec(excluded,r);
break;
}
}
}
}
return excluded;
}
我在这个R程序中使用了这些Rcpp程序:
GabrielGraphMatrix <- function(X,Y,PlotIt=FALSE){
# Heuristic rejection Algorithm for Gabriel Graph Construction (Bhattacharya et al. 1981)
# Algorithm is ~ O(d n^2)
#loading Rcpp functions
library(Rcpp)
sourceCpp("... .cpp")
XY <- cbind(X,Y)
ndim <- ncol(XY)
npts <- nrow(XY)
edges1<- c()
edges2<- c()
for( i in 1:(npts-1) ) {
# Candidate set of Gabriel neighbors
vertices <- (i+1):npts
# Initialize list of vertices to be excluded from Ni
excluded <- edges(XY,vertices,vector(),i);
adj <- vertices[which(!match(vertices,excluded,nomatch=F)>0)]
if(length(adj) > 0) {
edges1=c(edges1,rep(i,length(adj)))
edges2=c(edges2,adj)
}
}
anz <- length(edges1)
Gabriel <- Gab(matrix(0, npts, npts),edges1,edges2,anz)
return(list(Gabriel=Gabriel,edges=cbind(edges1,edges2)))
}
对于十个数据点的样本数据,它可以正常工作,例如:
z <- 10
X <- runif(z)*100
Y <- runif(z)*100
GabrielGraphMatrix(X,Y)
返回
> GabrielGraphMatrix(X,Y)
$Gabriel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 0 0 0 0 0 0 0
[2,] 1 0 0 1 0 0 1 0 0 0
[3,] 0 0 0 1 1 0 0 0 0 1
[4,] 0 1 1 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 1 0 0
[7,] 0 1 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 1 0 0 1 1
[9,] 0 0 0 0 0 0 0 1 0 1
[10,] 0 0 1 0 0 0 0 1 1 0
$edges
edges1 edges2
[1,] 1 2
[2,] 2 4
[3,] 2 7
[4,] 3 4
[5,] 3 5
[6,] 3 10
[7,] 6 8
[8,] 8 9
[9,] 8 10
[10,] 9 10
但如果我尝试输入更大的数据集,我会收到以下错误消息:
Error: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'builtin'
如果有人至少知道我做错了什么,我会非常感激。
答案 0 :(得分:3)
以防任何人遇到同样的问题。最终我的解决很容易。错误在于函数
// [[Rcpp::export]]
NumericVector vektorzugriff(NumericMatrix xy,int i){
//to return a row of the Matrix xy
int col = xy.ncol();
NumericVector out(col);
for(int j=0; j<=col; j++){
out[j] = xy(i-1,j);
}
return out;
}
for-loop太长了。它应该是for(int j=0; j<col; j++)
而不是for(int j=0; j<=col; j++)
。
答案 1 :(得分:1)
我无法重现你的错误,但它抛出了各种类似的错误,并经常使R崩溃。以下是一些明显的问题。
在您的C ++函数Gab
中,您至少有两个问题:
anz
之前,不要定义它。 Gabriel
。此
Gabriel(edges1[i]-1, edges2[i]-1)
应该是
Gabriel[edges1[i]-1, edges2[i]-1]
在你的R函数GabrielGraphMatrix
中,你在循环中增长edges1
和edges2
。这意味着它们必须在for循环的每次迭代中重新分配。一旦你超过了微不足道的循环长度,这将导致问题。
相反,请将它们预先分配为列表,然后再调用unlist
以获取所需的矢量。
# before the loop
edges1 <- vector("list", npts - 1)
edges2 <- vector("list", npts - 1)
# in the loop
if(length(adj) > 0) {
edges1[[i]] <- rep(i,length(adj))
edges2[[i]] <- adj
}
# after the loop
edges1 <- unlist(edges1)
edges2 <- unlist(edges2)