Question

我有一台64位Windows 7机器，内存为8GB。 memory.limit()显示8135。我遇到了内存问题，尽管我想要做的事情看起来并不是很糟糕（与其他与内存相关的问题相比）。

基本上我将公司的ID与他们的行业相匹配。 ref.table是我存储ID和行业的数据框，供参考。

matchid <- function(id) {
  firm.industry <- ref.table$industry[ref.table$id==id]
  firm.industry <- as.character(firm.industry[1]) # Sometimes same ID has multiple industries. I just pick one.
  resid <<- c(resid, firm.industry)
}
resid <- c()
invisible( lapply(unmatched.id, matchid) ) # unmatched.id is the vector of firms' ID to be matched

unmatched.id向量长约60,000个元素。我仍然收到错误“无法分配 41.8kb 大小的矢量”（仅41.8kb！）Windows任务管理器始终显示完整的RAM使用情况。

是不是因为我的功能太笨拙了？我无法想象这是导致问题的矢量大小。

（PS：我经常做gc（）和rm（））

Answer 1

尝试以下操作以查看是否退出给您记忆投诉

 lapply(unmatched.id, function(id) as.character(ref.table$industry[ref.table$id==id]))

如果上述方法有效，请将其包装在unlist( .., use.names=FALSE)

中

或尝试使用data.table

library(data.table)
ref.table <- data.table(ref.table, key="id") 
ref.table[.(unmatched.id), as.character(industry)]

Answer 2

我认为你正在查找ref.table$id中无法匹配的id的向量，并找到相应的索引

## first match, one for each unmatched.id, NA if no match
idx <- match(unmatched.id, ref.table$id)
## matching industries
resid <- ref.table$industry[idx]

这是'矢量化'比lapply更有效率。

尽管有足够的RAM，但内存错误

2 个答案:

或尝试使用data.table