Question

我一直在做一些主题建模（LDA），并且我已经为每个文档创建了一个后验概率矩阵（在这种情况下，这是一天的推文）。我想衡量每天讨论的重点，所以我希望看到需要多少主题才能解释＆＃34;＆＃34;那天讨论的一定比例。我能够为少数主题做到这一点：

thresh<-.98
distribution98 <- function(x){
  if (x[k]>thresh){x<-1}
  else if(x[k]+x[k-1]>thresh){x<-2}
  else if(x[k]+x[k-1]+x[k-2]>thresh){x<-3}
  else {x<-4}}
apply(ndx, 2, short)

其中ndx是我的后代矩阵（每列是一天，每一行都是一个主题，我已经将每一列从最低到最高排序），这个特定的功能正在寻找需要多少主题才能解释98％的讨论。

我正在尝试编写一个可以针对任意数量的主题执行此操作的函数，并且我收到了一条我不理解的错误消息：

k<-100
results<-vector(mode="numeric", length=324)
short<- function(x){ for (j in 1:ncol(ndx)) {
  i<-0
  total<-0
  while(total < thresh){
    total<-(total+x[k-i])
    i<-(1+i)
    results[j]<-i
  }
}
}
apply(ndx, 2, short)
Error in while (total < thresh) { : argument is of length zero

我的想法是，这会给我留下一个向量（结果），这只是一个记录，我必须得到多大才能将总数推到高于阈值。但是我不理解错误 - 总数和阈值都是数字的，所以总数＆lt; thresh要么是真的还是假的？

Answer 1

我猜你正在寻找这样的东西：

## giving a vector x and a threshold .thresh
## returns the min index, where the cumulative sum of x >  .thresh
get_min_threshold <- 
function(x,.thresh)
  max(which(cumsum(x[order(x)]) < .thresh))+1

## apply the function to each column of the data.frame
lapply(ndx,get_min_threshold,.thresh=.98)

而R中的循环给出“参数长度为零”错误

1 个答案: