逐步更改.csv文件中的日期和数据单元格

时间:2014-09-18 04:22:48

标签: bash csv for-loop awk sed

我有一个文件,我正准备为我的老板及时做好准备,以便明天早上8点到8点-8GMT他的经理会面。我想追溯更改此.csv文件中非连续行的日期:(截断)

,,,,,
,,,,,sideshow
,,,
date_bob,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14
bob_available,531383,531383,531383,531383,531383,531383,531383,531383,531383,531383,531383,531383,531383,531383
bob_used,448312,448312,448312,448312,448312,448312,448312,448312,448312,448312,448312,448312,448312,448312
,,,
date_mel,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14
mel_available,343537,343537,343537,343537,343537,343537,343537,343537,343537,343537,343537,343537,343537,343537
mel_used,636159,636159,636159,636159,636159,636159,636159,636159,636159,636159,636159,636159,636159,636159
,,,
date_sideshow-ws2,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14
sideshow-ws2_available,936239,936239,936239,936239,936239,936239,936239,936239,936239,936239,936239,936239,936239,936239
sideshow-ws2_used,43441,43441,43441,43441,43441,43441,43441,43441,43441,43441,43441,43441,43441,43441
,,,
,,,,,simpsons
,,,
date_bart,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14
bart_available,62559,62559,62559,62559,62559,62559,62559,62559,62559,62559,62559,62559,62559,62559
bart_used,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117,1135117
,,,
date_homer,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14
homer_available,17799,17799,17799,17799,17799,17799,17799,17799,17799,17799,17799,17799,17799,17799
homer_used,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877,1179877
,,,
date_lisa,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14
lisa_available,3899,3899,3899,3899,3899,3899,3899,3899,3899,3899,3899,3899,3899,3899
lisa_used,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777,1193777

换句话说,现在是一行:

date_lisa,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14,09-17-14

希望阅读:

date_lisa,09-04-14,09-05-14,09-06-14,09-07-14,09-08-14,09-09-14,09-10-14,09-11-14,09-12-14,09-13-14,09-14-14,09-15-14,09-16-14,09-17-14

我希望在开始时减少每日可用数量,然后逐渐变大。这意味着所使用的行必须在开始时按比例缩小,然后在锁定步骤中随着可用行的缩小而逐渐变大。

不是大量的,不要让它看起来只有几GB在这里和那里。我计划制作透视表和图表,因此它必须有所不同。顺便说一句,这些数字都是以MB为单位,因为我使用df -m生成它们。

如果有人可以帮助我,请提前致谢。

1 个答案:

答案 0 :(得分:2)

以下awk可以满足您的需求:

awk -F, -v OFS=, '
/^date/ {
    split ($2, date, /-/); 
    for (i=2; i<=NF; i++) {
        $i = date[1] "-" sprintf ("%02d", date[2] - NF + i) "-" date[3]
    }
}
/available|used/ {
    for (i=2; i<=NF; i++) {
        $i = int (($i*i)/NF)
    }
}1' csv
  • 将输入和输出字段分隔符设置为,
  • 以date开头的所有行,我们拆分第二列以查找日期部分。
  • 我们从第二列迭代到行尾,并将列设置为新计算的开始日期,该日期基本上使用当前日期和字段总数。
  • 所有其他行保持原样并与修改后的行一起打印。
  • 这有一个警告,不能正确地滚动不同月份。
  • 对于数据字段,我们从第二列迭代到行尾并进行计算以使它们逐渐大于前一个以匹配最后一个字段的原始值。