如何在第一行和第二行中提取唯一的字符串模式

时间:2019-11-19 02:58:40

标签: shell grep

我需要执行以下步骤,以通过GREP或任何其他命令在第二行以下提取唯一的错误代码。由于某些原因,我无法获得完整的结果。能否请你帮忙?谢谢!

grep -e '.*[A-Z]-[0-9]*:' -o  -e '.*row.[.0-9]*' test.log
  1. 需要捕获错误代码下仅一行(以“错误处理列...”开头的行)的ORA / KUP错误代码行。
  2. 请注意,对于ORA-01843行,我们有两条不同的第二行,因此我们只需要第一行带有“ CON_START_DATE在第7行”,而另一行则是用于“错误处理列CON_END_DATE在第66行”。 对于具有相同列名的其他行,不需要任何其他行。其他错误代码也是如此(例如ORA-01722)。 基本上,错误代码和错误处理行必须是唯一的
  3. 必须在行号后剪切任何文本。例如,“数据文件test_data1.csv第6行中的错误处理列CON_START_DATE” 将变为“第6行中的错误处理列CON_START_DATE”
  4. 如果没有任何错误代码的第二行(以“错误处理列......开头”),则必须将其删除。

test.log:

LOG file opened at 01/01/18 10:10:10

KUP-05004:   Warning: parallel select was not requested.

Field Definitions for table DATA_1_STG
  Record format DELIMITED BY NEWLINE
  Data in file has same endianness as the platform
  Reject rows with all null fields

error processing column CON_START_DATE in row 1 for datafile test_data1.csv
ORA-01858: a non-numeric character was found where a numeric was expected
error processing column SUPPLIER_ID in row 3 for datafile test_data1.csv
ORA-01722: invalid number
error processing column CON_START_DATE in row 6 for datafile test_data1.csv
ORA-01843: not a valid month
error processing column CON_START_DATE in row 7 for datafile test_data1.csv
ORA-01843: not a valid month
error processing column CON_START_DATE in row 8 for datafile test_data1.csv
ORA-01722: invalid number
error processing column CON_START_DATE in row 6 for datafile test_data1.csv
ORA-01843: not a valid month
KUP-04073: record ignored because all referenced fields are null for a record
error processing column CON_END_DATE in row 65 for datafile test_data1.csv
ORA-01843: not a valid month
error processing column CON_END_DATE in row 66 for datafile test_data1.csv
ORA-01843: not a valid month
error processing column CON_END_DATE in row 67 for datafile test_data1.csv
ORA-01843: not a valid month
error processing column CON_START_DATE in row 102 for datafile test_data1.csv
ORA-01843: not a valid month

所需结果:

KUP-05004:   Warning: Intra source concurrency disabled because parallel select was not requested.
error processing column CON_START_DATE in row 1
ORA-01858: a non-numeric character was found where a numeric was expected
error processing column SUPPLIER_ID in row 3
ORA-01722: invalid number
error processing column CON_START_DATE in row 6
ORA-01843: not a valid month
error processing column CON_START_DATE in row 7
KUP-04073: record ignored because all referenced fields are null for a record
error processing column CON_END_DATE in row 65
ORA-01843: not a valid month
error processing column CON_END_DATE in row 66

1 个答案:

答案 0 :(得分:0)

使用awk可以实现一个相当简单的解决方案:

awk -f script.awk test.log

script.awk

/^(ORA|KUP)-[0-9]+:/ {
    # found an error code

    # Store it for later use.
    # Note: If no error processing message has yet been read,
    #       the previously stored code is simply ignored. (4)
    c = $0

    next # don't bother doing anything else with this line
}

# c doubles as the state indicator
# if c is unset, state is: looking for an error code
# if c is set, state is: looking for an error processing message

c && /^error/ {
    # found an error processing message while looking for one

    # store it for later use
    e = $0

    # strip unwanted part
    sub(/ in row .*/,"",e)

    # Indices of array s are unique combinations of code / message.
    # The first time a combination is seen, the value is not yet set
    # (and so s[...] tests false and !s[...] tests true). (2)

    # `++` is an efficient way to do "test and then increment"
    # so that the initially false-ish value tests true next time.

    if ( !s[c"\0"e]++ ) {
        # output the stored error code line (1)
        print c

        # strip unwanted part of message then output (3)(1)
        # by default, sub and print act on $0
        sub(/ for .*/,"")
        print
    }

    # toggle state to looking for an error code
    c=!c
}

# ignore any other lines

在最坏的情况下,将为输入文件中的每对代码/消息行添加一个数组元素。如果文件大于内存,则可能是个问题。