我有一个看起来像这样的大文件
region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8
当我使用fread读取它时,突然找不到82365593523656436
correlations <- data.frame(fread('all_to_all_correlations.txt'))
> "82365593523656436" %in% correlations$region
[1] FALSE
我可以找到一个略有不同的数字
> "82365593523656432" %in% correlations$region
[1] TRUE
但是这个数字不在实际文件中
grep 82365593523656432 all_to_all_correlations.txt
没有结果,而
grep 82365593523656436 all_to_all_correlations.txt
确实
当我尝试阅读上面显示的小样本文件而不是我得到的完整文件
Warning message:
In fread("test.txt") :
Some columns have been read as type 'integer64' but package bit64 isn't loaded.
Those columns will display as strange looking floating point data.
There is no need to reload the data.
Just require(bit64) toobtain the integer64 print method and print the data again.
,数据看起来像
region type coeff p.value distance count
1 3.758823e-303 A -0.94940 0.050 -16479472 8
2 3.758823e-303 B 0.47303 0.526 57815363 8
3 3.758823e-303 C -0.89380 0.106 42848210 8
所以我认为在阅读期间,82365593523656436已更改为82365593523656432。如何防止这种情况发生?
答案 0 :(得分:1)
ID(这显然是第一列的内容)通常应该被理解为字符:
correlations <- setDF(fread('region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8',
colClasses = c(region = "character")))
str(correlations)
#'data.frame': 3 obs. of 6 variables:
# $ region : chr "82365593523656436" "82365593523656436" "82365593523656436"
# $ type : chr "A" "B" "C"
# $ coeff : num -0.949 0.473 -0.894
# $ p-value : num 0.05 0.526 0.106
# $ distance: num -16479473 57815363 42848211
# $ count : int 8 8 8