我已将2列数据复制到文件中。由于my_date的群集密钥设置为按降序返回
echo "copy home.admin (id,my_date) to 'myOutputFile';" > copyInputs.cql
myOutputFile -
TEST1,2015-01-01 15:00:00+0000
TEST1,2014-09-04 14:00:00+0000
4.VOD,2015-08-18 04:00:00+0000
4.VOD,2015-06-26 04:00:00+0000
4.VOD,2015-05-13 04:00:00+0000
000TEST8,2015-11-19 05:00:00+0000
第一列是id,第二列是my_date。我想以相反的顺序读取每个id的数据。所以输出应该是这样的 -
TEST1,2014-09-04 14:00:00+0000
TEST1,2015-01-01 15:00:00+0000
4.VOD,2015-05-13 04:00:00+0000
4.VOD,2015-06-26 04:00:00+0000
4.VOD,2015-08-18 04:00:00+0000
000TEST8,2015-11-19 05:00:00+0000
获取此输出后,我正在准备一个更新语句以填充一个新列my_rev.my_rev将从100开始为eaach id并递增,直到找到新的id。
update home.admin my_rev =100 where id = 'TEST1' and my_date = '2014-09-04 14:00:00+0000';
update home.admin my_rev =101 where id = 'TEST1' and my_date = '2015-01-01 15:00:00+0000';
update home.admin my_rev =100 where id = '4.VOD' and my_date = '2015-05-13 04:00:00+0000';
update home.admin my_rev =101 where id = '4.VOD' and my_date = '2015-06-26 04:00:00+0000';
update home.admin my_rev =102 where id = '4.VOD' and my_date = '2015-08-18 04:00:00+0000';
有什么建议吗?
答案 0 :(得分:2)
我想以相反的顺序读取每个id
的数据
以相反的顺序打印每个id
:
$ awk -F, '$1==prev {s=$0 "\n" s; next} { printf "%s",s; s=$0 "\n"; prev=$1} END{printf "%s",s}' infile
TEST1,2014-09-04 14:00:00+0000
TEST1,2015-01-01 15:00:00+0000
4.VOD,2015-05-13 04:00:00+0000
4.VOD,2015-06-26 04:00:00+0000
4.VOD,2015-08-18 04:00:00+0000
000TEST8,2015-11-19 05:00:00+0000
工作原理:
此脚本使用两个变量。 prev
包含上一行的ID。 s
以相反的顺序包含最新ID的行。
-F,
这告诉awk使用逗号作为字段分隔符。
$1==prev {s=$0 "\n" s; next}
对于具有相同ID的行(字段1,表示为$1
),这会将新行添加到变量s
的开头。其余命令被跳过,awk跳转到next
行。
printf "%s",s; s=$0 "\n"; prev=$1
如果我们到这里,我们将开始一个新的ID。在这种情况下,我们会从之前的ID中打印s
中保存的行。我们使用当前行更新s
,然后将prev
设置为当前ID
。
END{printf "%s",s}
我们到达文件末尾后,打印s
作为最后一个ID。
如果您想进行更复杂的重新排序,则会针对每个sort
调用id
,并且具有所有灵活性,并保持每个id
的原始顺序:< / p>
$ awk -F, -v s=sort '$1==prev {print | s; next} {close(s); print | s; prev=$1}' infile
TEST1,2014-09-04 14:00:00+0000
TEST1,2015-01-01 15:00:00+0000
4.VOD,2015-05-13 04:00:00+0000
4.VOD,2015-06-26 04:00:00+0000
4.VOD,2015-08-18 04:00:00+0000
000TEST8,2015-11-19 05:00:00+0000
如果outfile包含上面排序命令的输出,则运行:
$ awk -F, '{if ($1==prev)n++; else n=100; prev=$1; printf "update home.admin my_rev =%i where id = '\''%s'\'' and my_date = '\''%s'\'';\n",n,$1,$2}' outfile
update home.admin my_rev =100 where id = 'TEST1' and my_date = '2014-09-04 14:00:00+0000';
update home.admin my_rev =101 where id = 'TEST1' and my_date = '2015-01-01 15:00:00+0000';
update home.admin my_rev =100 where id = '4.VOD' and my_date = '2015-05-13 04:00:00+0000';
update home.admin my_rev =101 where id = '4.VOD' and my_date = '2015-06-26 04:00:00+0000';
update home.admin my_rev =102 where id = '4.VOD' and my_date = '2015-08-18 04:00:00+0000';
update home.admin my_rev =100 where id = '000TEST8' and my_date = '2015-11-19 05:00:00+0000';
答案 1 :(得分:1)
sort
应该做的伎俩
sort -r -t, -k1,2 infile
通常,您需要的唯一选项是-r
。