我想从一个INSERT语句中删除列名值对,并将其移动到另一个INSERT语句中。我有大约一百个具有这种格式的单独文件(尽管格式可能因文件而异,例如某些用户可能将整个INSERT语句放在一行上)。
INPUT
INSERT INTO table1 (
col1,
col2
)
VALUES (
foo,
bar
);
INSERT INTO table2 (
col3,
col4_move_this_one,
col5
)
VALUES (
john,
doe_move_this_value,
doe
);
输出
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
doe_move_this_value,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);
一般情况下,上面的格式我认为我可以在脚本中使用sed和cat来查找要移动的每一行的行号,然后移动它,就像这样。
for file in *; do
line_number=$(cat -n ${file} | sed some_statement | awk to_get_line_number)
# etc...
done
...但也许你们可以推荐一种更聪明的方法,如果INSERT语句在一行上也可以。
答案 0 :(得分:1)
使用GNU awk实现真正的多维数组,第3个arg匹配(),多个字符RS和\ s / \ S语法糖:
$ cat tst.awk
BEGIN { RS="\\s*);\\s*" }
match($0,/(\S+\s+){2}([^(]+)[(]([^)]+)[)][^(]+[(]([^)]+)/,a) {
for (i in a) {
gsub(/^\s*|\s*$/,"",a[i])
gsub(/\s*\n\s*/,"",a[i])
}
tables[NR] = a[2]
names[NR][1]; split(a[3],names[NR],/,/)
values[NR][1]; split(a[4],values[NR],/,/)
}
END {
names[1][3] = names[1][2]
names[1][2] = names[2][2]
names[2][2] = names[2][3]
delete names[2][3]
values[1][3] = values[1][2]
values[1][2] = values[2][2]
values[2][2] = values[2][3]
delete values[2][3]
for (tableNr=1; tableNr<=NR; tableNr++) {
printf "INSERT INTO %s (\n", tables[tableNr]
cnt = length(names[tableNr])
for (nr=1; nr<=cnt; nr++) {
print " " names[tableNr][nr] (nr<cnt ? "," : "")
}
print ")"
print "VALUES ("
cnt = length(values[tableNr])
for (nr=1; nr<=cnt; nr++) {
print " " values[tableNr][nr] (nr<cnt ? "," : "")
}
print ");\n"
}
}
$ awk -f tst.awk file
INSERT INTO table1 (
col1,
col4_move_this_one,
col2
)
VALUES (
foo,
doe_move_this_value,
bar
);
INSERT INTO table2 (
col3,
col5
)
VALUES (
john,
doe
);
答案 1 :(得分:1)
GAWK版本,它依赖于gensub
的反向引用功能,并且严重依赖于正则表达式。
$ cat > test.awk
BEGIN {
RS=" *) *; *" # set RS to ");" and prepare to space as well
ORS=");\n"
}
{
sub(/^[ \n]*/,"") # remove emptiness before second INSERT
}
$0 ~ /^INSERT/ && NR==1 {
a=$0 # store the first INSERT
}
$0 ~ /^INSERT/ && NR==2 { # store the second and use gensub to
b=$0 # find the second variables in INSERT and VALUES
split(gensub(/(INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1}[ \n]*([^,]*)[^\)]*\)*[ \n]*/,"\\3 ","g"),c," ")
}
END { # print first INSERT with second variables in place
# and second INSERT with variables removed
print gensub(/((INSERT|VALUES)[^\(]*\((([ \n]*)[^,]*,){1})/,"\\1\\4"c[++i]",\\5","g",a)
print gensub(/((INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1})[ \n]*[^,]*,/,"\\1 ","g",b)
}
此解决方案假设要复制的变量是关键字INSERT
和INSERT
之后的第二个VALUES
中的第二个变量,并且它们被添加到第一个{{1}中的相同位置}}。解决方案是空格且INSERT
友好,但不支持\n
,我认为很容易修复。
\t