如何告诉big.matrix将某些列视为向量

时间:2013-05-07 22:13:39

标签: r r-factor

我有一个包含许多不同值的字符变量的大数据集。我正在尝试将数据读作big.matrix,然后使用biglm.big.matrix来构建线性模型。但是,由于big.matrix会将所有字符向量转换为因子,因此字符标签将丢失。我决定在R之外为我的字符列创建一个查找表,并使用数字来表示R的不同级别。但是,我不知道如何告诉big.matrix这些列应该被视为因子而不是数字。请帮忙。

1 个答案:

答案 0 :(得分:0)

我对read.table.ffdf不是很熟悉,但你可以使用它的x参数吗?来自?read.table.ffdf

x
   NULL or an optional ffdf object to which the read records are appended. If this is provided,
   it defines crucial features that are otherwise determnined during the 'first' chunk of
   reading: vmodes, colnames, colClasses, sequence of predefined levels. In order to also read
   the first chunk into such predefined ffdf, an x with 1 row is treated special: instead of
   appending the first row will be overwritten. This is necessary because we cannot provide x 
   with zero rows (we cannot create ff vectors with zero elements).

您可以将x中的相应列定义为具有给定级别的因子,然后将其用作模板。