如何知道r在幕后做了什么

时间:2013-10-04 23:48:34

标签: r

作为一名新的R用户,我非常好奇当我们输入一个函数时R正在做什么。例如,我在类包中使用knn函数。我需要做的就是键入knn并按列车和测试数据集定义。然后我得到的是我的测试数据的预测类。但是,我很好奇是否有办法看到实际的方程式/公式。我已经看了一些knn参考文献,但我仍然对R正在做什么感到好奇!是否有可能找到这样的信息?

非常感谢任何帮助!!!

3 个答案:

答案 0 :(得分:11)

嗯,你可以做的第一件事就是输入函数的名称,在很多情况下,它会在那里为你提供源代码。例如:

> knn
function (train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE) 
{
    train <- as.matrix(train)
    if (is.null(dim(test))) 
        dim(test) <- c(1, length(test))
    test <- as.matrix(test)
    if (any(is.na(train)) || any(is.na(test)) || any(is.na(cl))) 
        stop("no missing values are allowed")
    p <- ncol(train)
    ntr <- nrow(train)
    if (length(cl) != ntr) 
        stop("'train' and 'class' have different lengths")
    if (ntr < k) {
        warning(gettextf("k = %d exceeds number %d of patterns", 
            k, ntr), domain = NA)
        k <- ntr
    }
    if (k < 1) 
        stop(gettextf("k = %d must be at least 1", k), domain = NA)
    nte <- nrow(test)
    if (ncol(test) != p) 
        stop("dims of 'test' and 'train' differ")
    clf <- as.factor(cl)
    nc <- max(unclass(clf))
    Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr), 
        as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)), 
        as.double(test), res = integer(nte), pr = double(nte), 
        integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))
    res <- factor(Z$res, levels = seq_along(levels(clf)), labels = levels(clf))
    if (prob) 
        attr(res, "prob") <- Z$pr
    res
}
<bytecode: 0x393c650>
<environment: namespace:class>
> 

在这种情况下,您可以看到实际工作是通过外部调用VR_knn完成的。如果您想深入挖掘,可以转到http://cran.r-project.org/web/packages/class/index.html,然后下载此软件包的源代码。如果您下载并解压缩源代码,您将找到一个名为“src”的文件夹,其中包含C代码,您可以查看该文件夹,并找到该函数的源代码:

void
VR_knn(Sint *kin, Sint *lin, Sint *pntr, Sint *pnte, Sint *p,
       double *train, Sint *class, double *test, Sint *res, double *pr,
       Sint *votes, Sint *nc, Sint *cv, Sint *use_all)
{
    int   i, index, j, k, k1, kinit = *kin, kn, l = *lin, mm, npat, ntie,
          ntr = *pntr, nte = *pnte, extras;
    int   pos[MAX_TIES], nclass[MAX_TIES];
    int   j1, j2, needed, t;
    double dist, tmp, nndist[MAX_TIES];

    RANDIN;
/*
    Use a 'fence' in the (k+1)st position to avoid special cases.
    Simple insertion sort will suffice since k will be small.
 */

    for (npat = 0; npat < nte; npat++) {
    kn = kinit;
    for (k = 0; k < kn; k++)
        nndist[k] = 0.99 * DOUBLE_XMAX;
    for (j = 0; j < ntr; j++) {
        if ((*cv > 0) && (j == npat))
        continue;
        dist = 0.0;
        for (k = 0; k < *p; k++) {
        tmp = test[npat + k * nte] - train[j + k * ntr];
        dist += tmp * tmp;
        }
/* Use 'fuzz' since distance computed could depend on order of coordinates */
        if (dist <= nndist[kinit - 1] * (1 + EPS))
        for (k = 0; k <= kn; k++)
            if (dist < nndist[k]) {
            for (k1 = kn; k1 > k; k1--) {
                nndist[k1] = nndist[k1 - 1];
                pos[k1] = pos[k1 - 1];
            }
            nndist[k] = dist;
            pos[k] = j;
/* Keep an extra distance if the largest current one ties with current kth */
            if (nndist[kn] <= nndist[kinit - 1])
                if (++kn == MAX_TIES - 1)
                error("too many ties in knn");
            break;
            }
        nndist[kn] = 0.99 * DOUBLE_XMAX;
    }

    for (j = 0; j <= *nc; j++)
        votes[j] = 0;
    if (*use_all) {
        for (j = 0; j < kinit; j++)
        votes[class[pos[j]]]++;
        extras = 0;
        for (j = kinit; j < kn; j++) {
        if (nndist[j] > nndist[kinit - 1] * (1 + EPS))
            break;
        extras++;
        votes[class[pos[j]]]++;
        }
    } else { /* break ties at random */
        extras = 0;
        for (j = 0; j < kinit; j++) {
        if (nndist[j] >= nndist[kinit - 1] * (1 - EPS))
            break;
        votes[class[pos[j]]]++;
        }
        j1 = j;
        if (j1 == kinit - 1) { /* no ties for largest */
        votes[class[pos[j1]]]++;
        } else {
/* Use reservoir sampling to choose amongst the tied distances */
        j1 = j;
        needed = kinit - j1;
        for (j = 0; j < needed; j++)
            nclass[j] = class[pos[j1 + j]];
        t = needed;
        for (j = j1 + needed; j < kn; j++) {
            if (nndist[j] > nndist[kinit - 1] * (1 + EPS))
            break;
            if (++t * UNIF < needed) {
            j2 = j1 + (int) (UNIF * needed);
            nclass[j2] = class[pos[j]];
            }
        }
        for (j = 0; j < needed; j++)
            votes[nclass[j]]++;
        }
    }

/* Use reservoir sampling to choose amongst the tied votes */
    ntie = 1;
    if (l > 0)
        mm = l - 1 + extras;
    else
        mm = 0;
    index = 0;
    for (i = 1; i <= *nc; i++)
        if (votes[i] > mm) {
        ntie = 1;
        index = i;
        mm = votes[i];
        } else if (votes[i] == mm && votes[i] >= l) {
        if (++ntie * UNIF < 1.0)
            index = i;
        }
    res[npat] = index;
    pr[npat] = (double) mm / (kinit + extras);
    }
    RANDOUT;
}

答案 1 :(得分:3)

在您的编辑器(例如,RStudio)中输入函数名称并执行该行。这将显示该函数的源代码,即类型

knn

在RStudio中,您还可以单击该功能并点击F2。带有功能源代码的新选项卡将打开。

或者你可以使用

debug(knn)
knn(your function arguments)

并使用调试器逐步执行该功能。 完成后使用

undebug(knn)

答案 2 :(得分:0)

October 2006 R News中的帮助台文章(后来发展为The R Journal的简报)展示了如何访问R功能的来源,涵盖了您可能需要使用的许多不同情况,从键入函数的名称,查找命名空间,到查找已编译代码的源文件。