我编写了一个使用compress = yes选项创建SAS数据集的代码。也就是说,结果数据集的压缩程度越来越大,如log
所示1374 +proc sql;
1375 + create table seg.KRG_EO_PVS_CUST_PROD_&op_cyc.
1376 + (
1377 + COMPRESS = YES
1378 + ) as
1379 + select
^L32 The SAS System 02:15 Thursday, August 20, 2015
1380 + W6DFFTE1.DIB_CUST_ID length = 8
1381 + format = 15.
1382 + informat = 15.
1383 + label = 'The logical customer id',
1384 + W6DFFTE1.DIB_PROD_ID length = 8
1385 + format = 15.
1386 + informat = 15.
1387 + label = 'The product id',
1388 + case when W5TM24S0.OFFER_FLAG = "1" then "1" else "0" end as OFFER_FLAG length = 1,
1389 + sum(W6DFFTE1.TOT_QUANTITY ) as TOT_QUANTITY length = 8
1390 + format = 10.
1391 + informat = 5.
1392 + label = 'Number of items purchased'
1393 + from
1394 + work.W6DFFTE1 left join
1395 + work.W5TM24S0
1396 + on
1397 + (
1398 + W5TM24S0.DIB_STORE_ID = W6DFFTE1.DIB_STORE_ID
1399 + and W5TM24S0.DIB_SCAN_ID = W6DFFTE1.DIB_SCAN_ID
1400 + )
1401 + group by
1402 + W6DFFTE1.DIB_CUST_ID,
1403 + W6DFFTE1.DIB_PROD_ID,
1404 + W5TM24S0.OFFER_FLAG
1405 + ;
NOTE: Compressing data set SEG.KRG_EO_PVS_CUST_PROD_20150701 increased size by 43.27 percent.
Compressed is 1961732 pages; un-compressed would require 1369265 pages.
NOTE: Table SEG.KRG_EO_PVS_CUST_PROD_20150701 created, with 346423801 rows and 4 columns.
我只是想知道发生这种情况的可能原因是什么
答案 0 :(得分:2)
SAS压缩非常原始,compress=yes
只是让SAS通过不在字符变量中写入未使用长度的实际字节数据来节省磁盘空间。看起来您的数据是三个数字变量,加上一个字符长的变量。这不是很多,加上它必须添加压缩文件所涉及的任何格式化开销。
如果您需要压缩文件以进行中期或长期存储,那么使用单独的zip或tar实用程序会更好。
编辑:我并不打算贬低SAS压缩。我相信设计师更关心的是保留相对快速的磁盘访问,而不是提供实际的zip压缩模式。