Question

在日常工作中，我需要从各种混合格式的日志和其他文本数据中提取文本。是否有一个实用程序（如awk，grep等）我可以用来快速执行任务而无需编写长bash / perl / python脚本？

示例1：对于下面的输入文字

mylog user=UserName;password=Password;other=information

我想提取用户名和密码值。伪实用程序最好看起来像这样（ a la awk）：

cat input-text.txt | magic --delimit-by=";" --then-by="="
  '{print "The username is $values[0][1] and password is $values[1][1]"}'

由;分隔的输入字符串放在$values数组中，并且该数组中的每个值都由=进一步分隔，以形成嵌套数组。

更好的是，拥有这样的东西会很好：

cat input-text.txt | magic --map-entry-sep=";" --map-key-val-sep="="
  '{print "The username is $[user] and password is $[password]"}'

将解析结果转换为地图以便按键查找。

示例2：也很好解析三重嵌套元素。考虑输入文本，如

mylog mylist=one,two,three;other=information

我现在想用以下内容提取列表mylist的第二个元素：

cat input-text.txt | magic --delimit-by=";" --then-by="=" --and-then-by=","
  '{print "The second element of mylist is: $values[0][1][1]}'

当然，我宁愿使用某种JSON解析器并将输入数据转换为相应的对象/地图/列表格式以便于提取，但这是不可能的，因为我正在使用不同格式的数据。

我通常使用awk，grep，cut和sed的组合使用几个管道并一次提取感兴趣的每个值（列），但这很繁琐，需要将不同的列合并为一个。通常，我需要CSV格式的所有提取列，以便在Excel中进一步处理。

对任何建议或意见表示感谢。

Answer 1

$ echo 'mylog user=UserName;password=Password;other=information' | 
    awk -F '[ ;]' -v keysep="=" \
        '{
              for (i=1; i<=NF; i++) {
                  split($i, t, keysep); 
                  a[t[1]] = t[2]
              };
         print "The username is " a["user"] " and password is " a["password"]
         }'
The username is UserName and password is Password

$ echo 'mylog mylist=one,two,three;other=information' | awk -F "[ =,;]" '{print $4}'
two

如何使用多个（嵌套）分隔符从文本中提取值

1 个答案: