Question

有点混乱的标题，但让我解释一下。我有一个制表符分隔文件，其中包含以下格式的行（列由制表符分隔）

Extra   573102|000473
Extra   ZRY|BC95624
Missing ABC|BC99000
Missing 123456|001122

我想根据以下逻辑将文件拆分为4个不同的文件：

如果行包含＆＃34;额外＆＃34;并且只有数字直到＆＃34; |＆＃34;，将该行放在文件＃1中（在上面的例子中，文件＃1将包含＆＃34;额外的573102 | 000473＆＃34;）。
如果行包含＆＃34;额外＆＃34;并且只有字母直到＆＃34; |＆＃34;，将该行放在文件＃2中（在上面的例子中，文件＃2将包含＆＃34; Extra ZRY | BC95624＆＃34;）。
如果行包含＆＃34;缺少＆＃34;并且只有数字直到＆＃34; |＆＃34;，将该行放在文件＃3中（在上面的例子中，文件＃3将包含＆＃34;缺少ABC | BC99000＆＃34;）。
如果行包含＆＃34;缺少＆＃34;并且只有字母直到＆＃34; |＆＃34;，将该行放在文件＃4中（在上面的例子中，文件＃4将包含＆＃34;缺少123456 | 001122＆＃34;）。

我不知道如何组合文本，制表符和完成上述操作的正则表达式。

Answer 1

一些虚拟代码：

regex1 = "^Extra\h+\d+\|"
# This is Extra at the beginning of the string / line in multiline mode
# followed by spaces and digits up to the | character
regex2 = "^Extra\h+[a-zA-Z]+\|"
# same with letters
regex3 = "^Missing\h+\d+\|"
regex4 = "^Missing\h+[a-zA-Z]+\|"

if line matches regex1:
    append to file1
else if line matches regex2:
    append to file2
else if line matches regex3:
    append to file3
else if line matches regex4:
    append to file4

请参阅a demo on regex101.com

Answer 2

您可以使用awk：

awk -F'[\t |]+' '$1=="Extra" {
    if ($2~/^[0-9]+$/) print >> "file1"
    else
    if ($2~/^[A-Z]+$/) print >> "file2"
    next
}

$1=="Missing" {
    if ($2~/^[0-9]+$/) print >> "file3"
    else
    if ($2~/^[A-Z]+$/) print >> "file4"
}' yourfile

在分隔符

2 个答案: