Question

我有7个非常大的向量，从c1到c7。我的任务是简单地创建一个数据框架。但是，当我使用data.frame()时，会返回错误消息。

> newdaily <- data.frame(c1,c2,c3,c4,c5,c6,c7)
Error in if (mirn && nrows[i] > 0L) { : 
  missing value where TRUE/FALSE needed
Calls: data.frame
In addition: Warning message:
In attributes(.Data) <- c(attributes(.Data), attrib) :
  NAs introduced by coercion to integer range
Execution halted

它们都具有相同的长度（2,626,067,374个元素），我已经检查了没有NA。

我尝试将每个向量的1/5子集化，并且data.frame()函数工作正常。所以我想这与数据的长度/大小有关吗？任何想法如何解决此问题？非常感谢！

更新 data.frame和data.table都允许向量小于2 ^ 31-1。 Stil找不到创建一个超大data.frame的解决方案，所以我改为对我的数据进行子集化……希望将来可以使用更大的向量。

Answer 1

R的data.frames还不支持这么长的向量。

您的向量长于2 ^ 31-1 = 2147483647，这是可以表示的最大整数值。由于data.frame函数/类假定行数可以用整数表示，因此会出现错误：

x <- rep(1, 2626067374)
DF <- data.frame(x)
#Error in if (mirn && nrows[i] > 0L) { : 
#  missing value where TRUE/FALSE needed
#In addition: Warning message:
#In attributes(.Data) <- c(attributes(.Data), attrib) :
#  NAs introduced by coercion to integer range

基本上，这种情况在内部发生：

as.integer(length(x))
#[1] NA
#Warning message:
#  NAs introduced by coercion to integer range

结果，if条件变为NA，您会收到错误消息。

可能的是，您可以改用data.table软件包。不幸的是，我没有足够的RAM来测试：

library(data.table)
DT <- data.table(x = rep(1, 2626067374))
#Error: cannot allocate vector of size 19.6 Gb

Answer 2

对于这种数据大小，您必须优化内存，但是如何？

您需要将这些值写入文件中。

   output_name = "output.csv"
   lines = paste(c1,c2,c3,c4,c5,c6,c7, collapse = ";")
   cat(lines, file = output_name , sep = "\n")

但是也许您也需要对其进行分析，并且（如前所述）它需要大量内存。

因此，您必须按迭代方式按文件行（例如20k行）读取文件，以优化RAM内存，分析这些值，保存其结果并重复。。

    con = file(output_name )

    while(your_conditional) {
        lines_in_this_round = readLines(con, n = 20000)
        # create data.frame
        # analyse data
        # save result
        # update your_conditional
   }

希望对您有帮助。

如何为超大向量创建数据帧？

2 个答案: