我已将某些列数据复制到某个文件,然后尝试将一个列数据写入另一个文件。但我错了很多
这是我的输入文件: -
,E2Bn9,2015-04-29 00:00:00-0500
['2C173'],E2BA8,2015-04-29 00:00:00-0500
['5A475','2C174'],E2BA8,2015-06-29 00:00:00-0400
我使用了awk
,sed
命令,如下所示
sed -i 's/",/|/g' tempFile
awk -F '[|,]' '{ print "update table set cola = " $1 " where colb = " $2 " and colc = " $3 }' tempFile > updatestmt.cql
我的输出为
update table set cola = where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = "['5A475' where colb = '2C174'] and colc = E2BA8
前两行看起来很好,但最后一行是打印错误的值。
我希望最后一行为
update table set cola = "['5A475','2C174'] where colb =E2BA8 and colc = 2015-06-29 00:00:00-0400
答案 0 :(得分:4)
使用GNU awk 4. * for FPAT
:
$ awk -v FPAT='([^,]*)|([[][^]]+[]])' '{print "update table set cola =", $1, "where colb =", $2, "and colc =", $3}' file
update table set cola = where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['5A475','2C174'] where colb = E2BA8 and colc = 2015-06-29 00:00:00-0400
请参阅http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content。
使用非gawk awks或4.0之前版本的gawk(获得现代gawk!),您可以使用:
$ cat tst.awk
{
delete f
nf = 0
tail = $0
while ( (tail!="") && match(tail,/([^,]*)|([[][^]]+[]])/) ) {
f[++nf] = substr(tail,RSTART,RLENGTH)
tail = substr(tail,RSTART+RLENGTH+1)
}
print "update table set cola =", f[1], "where colb =", f[2], "and colc =", f[3]
}
$ awk -f tst.awk file
update table set cola = where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['5A475','2C174'] where colb = E2BA8 and colc = 2015-06-29 00:00:00-0400
您可以使用$0
代替f[]
,但随后每次分配到$(++nf)
时重新分割记录,就会产生性能开销,并且可能会出现这种情况您想稍后使用原始$0
。
答案 1 :(得分:1)
我选择了不同的方法, 所以我可以避免使用太复杂的reg-exp 它适用于任何旧的awk。
# cat tst.awk
{s="";}
$1!="" {for(i=1;i<NF-1;i++)s=s (i==1?"":",") $i;}
{printf("update table set cola = %s where colb = %s and colc = %s\n",s,$(NF-1),$NF);}
# awk -F, -f tst.awk yourinpfile
update table set cola = where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['5A475','2C174'] where colb = E2BA8 and colc = 2015-06-29 00:00:00-0400
我同意Ed的观点,即没有循环,我们有一个更好的解决方案,但我可以重用$(NF-1)
和$NF
修复的原始假设,以保持更简单的reg-exp。
{s="";}
$1!="" {s=$0;sub("," $(NF-1) "," $NF, "", s);}
{printf("update table set cola = %s where colb = %s and colc = %s\n",s,$(NF-1),$NF);}
答案 2 :(得分:0)
数据中的字段分隔符导致问题,准确地说是第三行括号内的逗号。解决方法可以是不同的sed,仅在第一个括号之外将,
转换为|
并使用FS='|'
:
sed -r 's/(.*\])?.*,/\1|/g' yourfile | awk -F '|' ....
其中....
代表你的其余awk脚本。
答案 3 :(得分:0)
如果只在示例代码中引用了列表值,则可以尝试使用此sed;
sed "s/' *, *'/' '/g;s/\([^,]*\),\([^,]*\),\(.*\)/update table set cola = \1 where colb = \2 and colc = \3/;s/' '/','/g" file