Question

我正在使用read.big.matrix在r中读取尺寸为3131875 * 5的数据。我的数据包含字符和数字列，包括日期变量。我应该使用的命令是

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
                       header=TRUE, 
                       backingfile="session.bin",
                       descriptorfile="session.desc",
                       type = NA)

但是在这种情况下R不接受type = NA，我收到错误：

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type,  : 
  Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",  :
  Because type was not specified, we chose double based on the first line of data.

我需要知道type这里应该是什么。我试过像double这样的选项，但这给我带来了同样的错误。

请帮帮我。

Answer 1

来自?read.big.matrix：

文件必须只包含一种原子类型（例如，所有整数）。

因此，您将无法使用字符，数字，整数，日期等组合读取数据。您可以对该文件执行一些操作，例如使用其他程序将字符变量转换为整数表示（比如转换为R中的因子）。

修改

在bigmemory website上有一个使用python脚本预处理数据的示例，将字符信息更改为整数。该脚本是为特定数据集编写的，但也许您可以将其用作数据的指南。

通过read.big.matrix读取R中的大数据

1 个答案: