在稀疏的132MB文件上使用613列调查数据进行fread segfault

时间:2014-01-15 13:29:05

标签: r data.table fread

我最近一直在学习data.table。 但是,当我使用fread从“http://dl.dropbox.com/u/20498362/GSS.csv”读取数据时,R会发生段错误。我该如何进一步调查?要重现只需下载文件并输入:

fread("GSS.csv")

该文件有许多NA变量;第一列也缺少列名。但是,如果我添加“rownames = TRUE”,它仍然无效。

谢谢!

1 个答案:

答案 0 :(得分:4)

更新:现已在CRAN上的v1.9.4中修复。


以前的回答......

非常感谢可重复的例子!我也看到了崩溃。奇妙!!

让我们打开verbose=TRUE以获取更多线索......

$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

> require(data.table)
Loading required package: data.table
data.table 1.8.10  For help type: help("data.table")

> fread("GSS.csv", verbose=TRUE)
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 613 columns
First row with 613 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 55088
Subtracted 1 for last eol and any trailing empty lines, leaving 55087 data rows
Type codes: 3002000030033030000033003000000033000300330000000030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003330003330000000000000000000000000000000000000000000000000003330000000000000003000303000000000000000000000000000000000033000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030000000000000000000000303 (first 5 rows)
Type codes: 3002000030033030000033003330000033032300333300000033000033330000000000000000000000000000000000000000000000000003300003333333330000000000000000000000300030000000000000000000000000000000000000000000000000000000000000000003333300003330000000033000000000000000000000000000000000000000000000000000000000000000000000000333000000000000300000003333333330000000000000000000000000000000000000000000000000003332000000000000003303333000000000000000003330000003000000333333333333333333333333300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000030033333330300000000000333 (+middle 5 rows)
Type codes: 3002000033033033000033003333000033032300333300000033000033333330000000000000000000000000000000000000000000000003300003333333330000000000000000000000300030000000000000300000000000000000000000000000000000000000000000000003333300003330000000033000000000000000000000000000000000000000000000000000000000000000000000000333000000000000300000003333333330000000000000000000000000000000000000000000000000003332200000300033003303333000000000000000003330003333000000333333333333333333333333300000000000000000000000030030000000000000000000000000000000000000000000000000000000000000000000000000000000030033333330300000000000333 (+last 5 rows)
Bumping column 39 from INT to INT64 on data row 1614, field contains '"working class"'
Bumping column 39 from INT64 to REAL on data row 1614, field contains '"working class"'
Bumping column 39 from REAL to STR on data row 1614, field contains '"working class"'
Bumping column 225 from INT to INT64 on data row 1614, field contains '"disagree"'
Bumping column 225 from INT64 to REAL on data row 1614, field contains '"disagree"'
Bumping column 225 from REAL to STR on data row 1614, field contains '"disagree"'
Bumping column 226 from INT to INT64 on data row 1614, field contains '"disagree"'
Bumping column 226 from INT64 to REAL on data row 1614, field contains '"disagree"'
Bumping column 226 from REAL to STR on data row 1614, field contains '"disagree"'
Bumping column 227 from INT to INT64 on data row 1614, field contains '"disagree"'
Bumping column 227 from INT64 to REAL on data row 1614, field contains '"disagree"'
Bumping column 227 from REAL to STR on data row 1614, field contains '"disagree"'
Bumping column 228 from INT to INT64 on data row 1614, field contains '"disagree"'
Bumping column 228 from INT64 to REAL on data row 1614, field contains '"disagree"'
Bumping column 228 from REAL to STR on data row 1614, field contains '"disagree"'
Bumping column 232 from INT to INT64 on data row 1614, field contains '"agree"'
Bumping column 232 from INT64 to REAL on data row 1614, field contains '"agree"'
Bumping column 232 from REAL to STR on data row 1614, field contains '"agree"'
Bumping column 233 from INT to INT64 on data row 1614, field contains '"agree"'
Bumping column 233 from INT64 to REAL on data row 1614, field contains '"agree"'
Bumping column 233 from REAL to STR on data row 1614, field contains '"agree"'
Bumping column 307 from INT to INT64 on data row 1614, field contains '"no"'
Bumping column 307 from INT64 to REAL on data row 1614, field contains '"no"'
Bumping column 307 from REAL to STR on data row 1614, field contains '"no"'
Bumping column 308 from INT to INT64 on data row 1614, field contains '"no"'
Bumping column 308 from INT64 to REAL on data row 1614, field contains '"no"'
Bumping column 308 from REAL to STR on data row 1614, field contains '"no"'
Bumping column 309 from INT to INT64 on data row 1614, field contains '"no"'
Bumping column 309 from INT64 to REAL on data row 1614, field contains '"no"'
Bumping column 309 from REAL to STR on data row 1614, field contains '"no"'
Bumping column 310 from INT to INT64 on data row 1614, field contains '"no"'
Bumping column 310 from INT64 to REAL on data row 1614, field contains '"no"'
Bumping column 310 from REAL to STR on data row 1614, field contains '"no"'
Bumping column 311 from INT to INT64 on data row 1614, field contains '"no"'
Bumping column 311 from INT64 to REAL on data row 1614, field contains '"no"'
Bumping column 311 from REAL to STR on data row 1614, field contains '"no"'
Bumping column 3 from INT to INT64 on data row 9121, field contains '2.54999995231628'
Bumping column 3 from INT64 to REAL on data row 9121, field contains '2.54999995231628'
Bumping column 234 from INT to INT64 on data row 9121, field contains '"not feel"'
Bumping column 234 from INT64 to REAL on data row 9121, field contains '"not feel"'
Bumping column 234 from REAL to STR on data row 9121, field contains '"not feel"'
Bumping column 235 from INT to INT64 on data row 9121, field contains '"feel"'
Bumping column 235 from INT64 to REAL on data row 9121, field contains '"feel"'
Bumping column 235 from REAL to STR on data row 9121, field contains '"feel"'
Bumping column 236 from INT to INT64 on data row 9121, field contains '"feel"'
Bumping column 236 from INT64 to REAL on data row 9121, field contains '"feel"'
Bumping column 236 from REAL to STR on data row 9121, field contains '"feel"'
Bumping column 237 from INT to INT64 on data row 9121, field contains '"not feel"'
Bumping column 237 from INT64 to REAL on data row 9121, field contains '"not feel"'
Bumping column 237 from REAL to STR on data row 9121, field contains '"not feel"'
Bumping column 238 from INT to INT64 on data row 9121, field contains '"feel"'
Bumping column 238 from INT64 to REAL on data row 9121, field contains '"feel"'
Bumping column 238 from REAL to STR on data row 9121, field contains '"feel"'
Bumping column 239 from INT to INT64 on data row 9121, field contains '"feel"'
Bumping column 239 from INT64 to REAL on data row 9121, field contains '"feel"'
Bumping column 239 from REAL to STR on data row 9121, field contains '"feel"'
Bumping column 2 from INT to INT64 on data row 12121, field contains '1.23500001430511'
Bumping column 2 from INT64 to REAL on data row 12121, field contains '1.23500001430511'
Bumping column 49 from INT to INT64 on data row 12121, field contains '"now and then"'
Bumping column 49 from INT64 to REAL on data row 12121, field contains '"now and then"'
Bumping column 49 from REAL to STR on data row 12121, field contains '"now and then"'
Bumping column 330 from INT to INT64 on data row 12121, field contains '"worst kind"'
Bumping column 330 from INT64 to REAL on data row 12121, field contains '"worst kind"'
Bumping column 330 from REAL to STR on data row 12121, field contains '"worst kind"'
Bumping column 609 from INT to INT64 on data row 12121, field contains '"good purpose"'
Bumping column 609 from INT64 to REAL on data row 12121, field contains '"good purpose"'
Bumping column 609 from REAL to STR on data row 12121, field contains '"good purpose"'
Bumping column 610 from INT to INT64 on data row 12121, field contains '"most of the time"'
Bumping column 610 from INT64 to REAL on data row 12121, field contains '"most of the time"'
Bumping column 610 from REAL to STR on data row 12121, field contains '"most of the time"'
Bumping column 98 from INT to INT64 on data row 15580, field contains '"somewhat agree"'
Bumping column 98 from INT64 to REAL on data row 15580, field contains '"somewhat agree"'
Bumping column 98 from REAL to STR on data row 15580, field contains '"somewhat agree"'
Bumping column 99 from INT to INT64 on data row 15580, field contains '"somewhat agree"'
Bumping column 99 from INT64 to REAL on data row 15580, field contains '"somewhat agree"'
Bumping column 99 from REAL to STR on data row 15580, field contains '"somewhat agree"'
Bumping column 100 from INT to INT64 on data row 15580, field contains '"strongly agree"'
Bumping column 100 from INT64 to REAL on data row 15580, field contains '"strongly agree"'
Bumping column 100 from REAL to STR on data row 15580, field contains '"strongly agree"'
Bumping column 101 from INT to INT64 on data row 15580, field contains '"somewht disagree"'
Bumping column 101 from INT64 to REAL on data row 15580, field contains '"somewht disagree"'
Bumping column 101 from REAL to STR on data row 15580, field contains '"somewht disagree"'
Bumping column 102 from INT to INT64 on data row 15580, field contains '"strongly agree"'
Bumping column 102 from INT64 to REAL on data row 15580, field contains '"strongly agree"'
Bumping column 102 from REAL to STR on data row 15580, field contains '"strongly agree"'
Bumping column 103 from INT to INT64 on data row 15580, field contains '"strongly agree"'
Bumping column 103 from INT64 to REAL on data row 15580, field contains '"strongly agree"'
Bumping column 103 from REAL to STR on data row 15580, field contains '"strongly agree"'
Bumping column 104 from INT to INT64 on data row 15580, field contains '"somewhat agree"'
Bumping column 104 from INT64 to REAL on data row 15580, field contains '"somewhat agree"'
Bumping column 104 from REAL to STR on data row 15580, field contains '"somewhat agree"'
Bumping column 250 from INT to INT64 on data row 15580, field contains '"somewht disagree"'
Bumping column 250 from INT64 to REAL on data row 15580, field contains '"somewht disagree"'
Bumping column 250 from REAL to STR on data row 15580, field contains '"somewht disagree"'
Bumping column 251 from INT to INT64 on data row 15580, field contains '"somewhat agree"'
Bumping column 251 from INT64 to REAL on data row 15580, field contains '"somewhat agree"'
Bumping column 251 from REAL to STR on data row 15580, field contains '"somewhat agree"'
Bumping column 252 from INT to INT64 on data row 15580, field contains '"somewht disagree"'
Bumping column 252 from INT64 to REAL on data row 15580, field contains '"somewht disagree"'
Bumping column 252 from REAL to STR on data row 15580, field contains '"somewht disagree"'
Bumping column 254 from INT to INT64 on data row 15580, field contains '"somewht disagree"'
Bumping column 254 from INT64 to REAL on data row 15580, field contains '"somewht disagree"'
Bumping column 254 from REAL to STR on data row 15580, field contains '"somewht disagree"'
Bumping column 256 from INT to INT64 on data row 15580, field contains '"somewhat agree"'
Bumping column 256 from INT64 to REAL on data row 15580, field contains '"somewhat agree"'
Bumping column 256 from REAL to STR on data row 15580, field contains '"somewhat agree"'
Bumping column 257 from INT to INT64 on data row 15580, field contains '"somewhat agree"'
Bumping column 257 from INT64 to REAL on data row 15580, field contains '"somewhat agree"'
Bumping column 257 from REAL to STR on data row 15580, field contains '"somewhat agree"'
Bumping column 105 from INT to INT64 on data row 15581, field contains '"somewhat agree"'
Bumping column 105 from INT64 to REAL on data row 15581, field contains '"somewhat agree"'
Bumping column 105 from REAL to STR on data row 15581, field contains '"somewhat agree"'
Bumping column 253 from INT to INT64 on data row 15581, field contains '"strngly disagree"'
Bumping column 253 from INT64 to REAL on data row 15581, field contains '"strngly disagree"'
Bumping column 253 from REAL to STR on data row 15581, field contains '"strngly disagree"'
Bumping column 255 from INT to INT64 on data row 15581, field contains '"strngly disagree"'
Bumping column 255 from INT64 to REAL on data row 15581, field contains '"strngly disagree"'
Bumping column 255 from REAL to STR on data row 15581, field contains '"strngly disagree"'
Bumping column 64 from INT to INT64 on data row 15584, field contains '"too little"'
Bumping column 64 from INT64 to REAL on data row 15584, field contains '"too little"'
Bumping column 64 from REAL to STR on data row 15584, field contains '"too little"'
Bumping column 65 from INT to INT64 on data row 15584, field contains '"too little"'
Bumping column 65 from INT64 to REAL on data row 15584, field contains '"too little"'
Bumping column 65 from REAL to STR on data row 15584, field contains '"too little"'
Bumping column 66 from INT to INT64 on data row 15584, field contains '"too little"'
Bumping column 66 from INT64 to REAL on data row 15584, field contains '"too little"'
Bumping column 66 from REAL to STR on data row 15584, field contains '"too little"'
Bumping column 67 from INT to INT64 on data row 15584, field contains '"too little"'
Bumping column 67 from INT64 to REAL on data row 15584, field contains '"too little"'
Bumping column 67 from REAL to STR on data row 15584, field contains '"too little"'
Bumping column 71 from INT to INT64 on data row 17053, field contains '"pay more"'
Bumping column 71 from INT64 to REAL on data row 17053, field contains '"pay more"'
Bumping column 71 from REAL to STR on data row 17053, field contains '"pay more"'
Bumping column 72 from INT to INT64 on data row 17053, field contains '"neither"'
Bumping column 72 from INT64 to REAL on data row 17053, field contains '"neither"'
Bumping column 72 from REAL to STR on data row 17053, field contains '"neither"'
Bumping column 73 from INT to INT64 on data row 17053, field contains '"neither"'
Bumping column 73 from INT64 to REAL on data row 17053, field contains '"neither"'
Bumping column 73 from REAL to STR on data row 17053, field contains '"neither"'
Bumping column 74 from INT to INT64 on data row 17053, field contains '"neither"'
Bumping column 74 from INT64 to REAL on data row 17053, field contains '"neither"'
Bumping column 74 from REAL to STR on data row 17053, field contains '"neither"'
Bumping column 75 from INT to INT64 on data row 17053, field contains '"neither"'
Bumping column 75 from INT64 to REAL on data row 17053, field contains '"neither"'
Bumping column 75 from REAL to STR on data row 17053, field contains '"neither"'
Bumping column 76 from INT to INT64 on data row 17053, field contains '"in favor"'
Bumping column 76 from INT64 to REAL on data row 17053, field contains '"in favor"'
Bumping column 76 from REAL to STR on data row 17053, field contains '"in favor"'
Bumping column 77 from INT to INT64 on data row 17053, field contains '"neither"'
Bumping column 77 from INT64 to REAL on data row 17053, field contains '"neither"'
Bumping column 77 from REAL to STR on data row 17053, field contains '"neither"'
Bumping column 78 from INT to INT64 on data row 17053, field contains '"neither"'
Bumping column 78 from INT64 to REAL on data row 17053, field contains '"neither"'
Bumping column 78 from REAL to STR on data row 17053, field contains '"neither"'
Bumping column 79 from INT to INT64 on data row 17053, field contains '"spend same"'
Bumping column 79 from INT64 to REAL on data row 17053, field contains '"spend same"'
Bumping column 79 from REAL to STR on data row 17053, field contains '"spend same"'
Bumping column 80 from INT to INT64 on data row 17053, field contains '"spend more"'
Bumping column 80 from INT64 to REAL on data row 17053, field contains '"spend more"'
Bumping column 80 from REAL to STR on data row 17053, field contains '"spend more"'
Bumping column 81 from INT to INT64 on data row 17053, field contains '"spend same"'
Bumping column 81 from INT64 to REAL on data row 17053, field contains '"spend same"'
Bumping column 81 from REAL to STR on data row 17053, field contains '"spend same"'
Bumping column 82 from INT to INT64 on data row 17053, field contains '"spend more"'
Bumping column 82 from INT64 to REAL on data row 17053, field contains '"spend more"'
Bumping column 82 from REAL to STR on data row 17053, field contains '"spend more"'
Bumping column 83 from INT to INT64 on data row 17053, field contains '"spend less"'
Bumping column 83 from INT64 to REAL on data row 17053, field contains '"spend less"'
Bumping column 83 from REAL to STR on data row 17053, field contains '"spend less"'
Bumping column 84 from INT to INT64 on data row 17053, field contains '"spend same"'
Bumping column 84 from INT64 to REAL on data row 17053, field contains '"spend same"'
Bumping column 84 from REAL to STR on data row 17053, field contains '"spend same"'
Bumping column 85 from INT to INT64 on data row 17053, field contains '"spend same"'

 *** caught segfault ***
address 0x56a24, cause 'memory not mapped'

Traceback:
 1: fread("GSS.csv", verbose = TRUE)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

似乎正在发生的是132MB文件非常稀疏(许多空白字段)。有613列和55087行。由于稀疏性,前5行,中5行和后5行不足以检测到这些列是character。当它到达这些列的第一个填充字段时,它正确地提升了许多列的列类型,这通常可以正常工作。然后就崩溃了。

非常感谢!我在这里提交了一份错误报告:

#493: Reproducible crash in fread