bash剪切并粘贴SQL insert语句

时间:2016-08-06 19:04:57

标签: sql bash awk sed grep

我想从一个INSERT语句中删除列名值对,并将其移动到另一个INSERT语句中。我有大约一百个具有这种格式的单独文件(尽管格式可能因文件而异,例如某些用户可能将整个INSERT语句放在一行上)。

INPUT

INSERT INTO table1 (
    col1,
    col2
)
VALUES (
    foo,
    bar
);

INSERT INTO table2 (
    col3,
    col4_move_this_one,
    col5
)
VALUES (
    john,
    doe_move_this_value,
    doe
);

输出

INSERT INTO table1 (
    col1,
    col4_move_this_one,
    col2
)
VALUES (
    foo,
    doe_move_this_value,
    bar
);

INSERT INTO table2 (
    col3,
    col5
)
VALUES (
    john,
    doe
);

一般情况下,上面的格式我认为我可以在脚本中使用sed和cat来查找要移动的每一行的行号,然后移动它,就像这样。

for file in *; do
    line_number=$(cat -n ${file} | sed some_statement | awk to_get_line_number)
    # etc...
done

...但也许你们可以推荐一种更聪明的方法,如果INSERT语句在一行上也可以。

2 个答案:

答案 0 :(得分:1)

使用GNU awk实现真正的多维数组,第3个arg匹配(),多个字符RS和\ s / \ S语法糖:

$ cat tst.awk
BEGIN { RS="\\s*);\\s*" }
match($0,/(\S+\s+){2}([^(]+)[(]([^)]+)[)][^(]+[(]([^)]+)/,a) {
    for (i in a) {
        gsub(/^\s*|\s*$/,"",a[i])
        gsub(/\s*\n\s*/,"",a[i])
    }
    tables[NR] = a[2]
    names[NR][1]; split(a[3],names[NR],/,/)
    values[NR][1]; split(a[4],values[NR],/,/)
}
END {
    names[1][3] = names[1][2]
    names[1][2] = names[2][2]
    names[2][2] = names[2][3]
    delete names[2][3]

    values[1][3] = values[1][2]
    values[1][2] = values[2][2]
    values[2][2] = values[2][3]
    delete values[2][3]

    for (tableNr=1; tableNr<=NR; tableNr++) {
        printf "INSERT INTO %s (\n", tables[tableNr]
        cnt = length(names[tableNr])
        for (nr=1; nr<=cnt; nr++) {
            print "    " names[tableNr][nr] (nr<cnt ? "," : "")
        }
        print ")"

        print "VALUES ("
        cnt = length(values[tableNr])
        for (nr=1; nr<=cnt; nr++) {
            print "    " values[tableNr][nr] (nr<cnt ? "," : "")
        }
        print ");\n"
    }
}

$ awk -f tst.awk file
INSERT INTO table1 (
    col1,
    col4_move_this_one,
    col2
)
VALUES (
    foo,
    doe_move_this_value,
    bar
);

INSERT INTO table2 (
    col3,
    col5
)
VALUES (
    john,
    doe
);

答案 1 :(得分:1)

GAWK版本,它依赖于gensub的反向引用功能,并且严重依赖于正则表达式。

$ cat > test.awk
BEGIN {
    RS=" *) *; *"         # set RS to ");" and prepare to space as well
    ORS=");\n"
}

{
    sub(/^[ \n]*/,"")     # remove emptiness before second INSERT
}

$0 ~ /^INSERT/ && NR==1 { 
    a=$0                  # store the first INSERT
}

$0 ~ /^INSERT/ && NR==2 { # store the second and use gensub to 
    b=$0                  # find the second variables in INSERT and VALUES
    split(gensub(/(INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1}[ \n]*([^,]*)[^\)]*\)*[ \n]*/,"\\3 ","g"),c," ")
}

END {                     # print first INSERT with second variables in place
                          # and second INSERT with variables removed
    print gensub(/((INSERT|VALUES)[^\(]*\((([ \n]*)[^,]*,){1})/,"\\1\\4"c[++i]",\\5","g",a)
    print gensub(/((INSERT|VALUES)[^\(]*\(([ \n]*[^,]*,){1})[ \n]*[^,]*,/,"\\1 ","g",b)
}

此解决方案假设要复制的变量是关键字INSERTINSERT之后的第二个VALUES中的第二个变量,并且它们被添加到第一个{{1}中的相同位置}}。解决方案是空格且INSERT友好,但不支持\n,我认为很容易修复。

\t