我有一个包含300万行x 4列char和int值的大数据框。当我使用R base save()命令保存此文件时,它占用16 Mb的空间。
然后我将一个小但相同的文件(1500行x 4列char和int值)绑定到另一个文件的末尾并再次保存。
一切正常,但文件现在需要24 Mb。有没有人知道为什么会这样?我正在处理数以百万计的观察结果,因此保持大小(和处理时间)是对我来说重要的一些。
str
这两个文件:
> str(bluetooth_oct)
'data.frame': 3069577 obs. of 4 variables:
$ timestamp : int 1380574809 1380574842 1380574852 1380574852 1380574864 1380574873 1380574890 1380574901 1380574901 1380574901 ...
$ scanned_user: chr "729d6181f70676b50921b11d2b0009" "792b94ad80885c219a53366de477d8" "e2f169c1af5636f137fa5cc8565bff" "02fbc27420b2c30e451b2457f22141" ...
$ user : chr "30383e7d47ff768d56639c31ac2664" "c7db19a439bd43bf467912f56453d7" "7eab3d34a4f9cc42e6c3e3d2de0b92" "7eab3d34a4f9cc42e6c3e3d2de0b92" ...
$ rssi : int -92 -76 -95 -70 -90 -97 -82 -63 -91 -90 ...
> str(bluetooth_oct2)
'data.frame': 3068039 obs. of 4 variables:
$ timestamp : int 1380574809 1380574842 1380574852 1380574852 1380574864 1380574873 1380574890 1380574901 1380574901 1380574901 ...
$ scanned_user: chr "729d6181f70676b50921b11d2b0009" "792b94ad80885c219a53366de477d8" "e2f169c1af5636f137fa5cc8565bff" "02fbc27420b2c30e451b2457f22141" ...
$ user : chr "30383e7d47ff768d56639c31ac2664" "c7db19a439bd43bf467912f56453d7" "7eab3d34a4f9cc42e6c3e3d2de0b92" "7eab3d34a4f9cc42e6c3e3d2de0b92" ...
$ rssi : int -92 -76 -95 -70 -90 -97 -82 -63 -91 -90 ...