使用自定义输出格式从JSON文件中提取键值对

时间:2017-02-22 11:08:52

标签: json awk sed grep jq

我想从一个巨大的日志文件中查找两个单词的组合,这些单词是分散的,而不是以任何特定的顺序。

示例日志:

    {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}

在此文件中,我想grep "1a":"<value>""h":"<value of logA or logB>",不应该有任何重复。

预期产出:

"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"

我尝试过这种方式使用egrep,但它给出了整行:

egrep -oE '1a\|"h"' but this does not give the required output.

awk /pattern1/ && /pattern2/ filename #no use

感谢您的帮助

4 个答案:

答案 0 :(得分:2)

考虑使用非常灵活的 jq JSON CLI ,而不是标准实用程序,其中:

  • 简化了解决方案
  • 让它变得健壮
  • 允许它被推广。
echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" }, 
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]' |
  jq -r --argjson keys '[ "1a", "h" ]' '
    .[] | "\"\($keys[0])\": \"\(.[$keys[0]])\" \"\($keys[1])\": \"\(.[$keys[1]])\""
  ' 

要自包含,该命令通过管道提供文字输入,并且格式化以便于阅读 要将文件传递给jq命令,只需在脚本关闭'后指定其路径即 (jq -r ... '...' file.json)功能

的产率:

"1a": "2017-01-28 00:00:00" "h": "logA"
"1a": "2017-01-28 00:00:11" "h": "logB"
  • --argjson keys '[ "1a", "h" ]'将变量$keys定义为要提取的键(属性)名称的JSON格式数组。

  • .[]枚举输入数组的所有元素 - 单个对象 - $keys[<n>].[$keys[<n>]]扩展为索引为<n>的属性名称和<该属性名称的em> value (请注意.[...]访问者)。

  • 大部分工作都花在输出格式上:嵌入式"字符。必须以\"进行转义,嵌入式变量引用必须包含在\(...)中 - 尽管使用带有单独标记的+来构建字符串也是一种选择。

推广解决方案

上面的命令不容易推广到每行输出的任意数量的键值对,因为数组索引(01)是显式指定。

peak's helpful answer的启发,其中显示了在jq中定义辅助函数的简单示例,以下变体使用内置函数和自定义函数的组合接受任意数量的密钥以提取

echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
  jq -r --argjson keys '[ "1a", "h", "i"  ]' '
    def printKv($k; $v): "\"\($k)\": \"\($v)\"";
    .[] | . as $o | 
      reduce $keys[] as $k (""; . + if .=="" then "" else " " end + printKv($k; $o[$k]))
  '

产生(每行3个键值对,因为传递了3个键):

"1a": "2017-01-28 00:00:00" "h": "logA" "i": "IFX"
"1a": "2017-01-28 00:00:11" "h": "logB" "i": "IFX"

内置reduce函数用于通过迭代键值对来构建目标字符串,并在自定义函数printKv的帮助下为每个键值对创建字符串表示。

根据peak的另一个建议,这里的更简单,更像jq的替代方案产生相同的输出

echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
  jq -r --argjson keys '[ "1a", "h", "i"  ]' '
    def printKv($k): "\"\($k)\": \"\(.[$k])\"";
    .[] | [ $keys[] as $k | printKv($k) ] | join(" ")
  '
  • printKv()现在只需要一个参数 - $k - 并依赖于管道输入 - 仍然包含输入对象 - 来提取关联 - .[$k]

  • $keys[] as $k | printKv($k)中加密[ ... ]多个 printKv次调用的输出作为单个数组通过管道

  • 然后,这允许内置的join函数将数组元素与空格连接以形成单个输出行。

答案 1 :(得分:1)

这是@ mklement0优秀答案的调整。通过定义“打印我”功能,调整可以最大限度地减少必须逃避双引号的烦恼:

def q: "\"\(tostring)\"";

.[] | "\($keys[0]|q): \(.[$keys[0]]|q) \($keys[1]|q): \(.[$keys[1]]|q)"

或者如果您愿意:

def printKV($k): "\"\($k)\": \"\(.[$k])\""; 

.[] | printKV($keys[0]) + " " + printKV($keys[1])

广义解决方案

使用上面定义的printKV/1,并假设$ keys在命令行上(或通过其他方式)定义为字符串数组:

def printKeyValues(keys):
  [keys[] as $key | printKV($key)] | join(" ");

.[] | printKeyValues($keys)

答案 2 :(得分:0)

awk救援!

$ awk -F, -v RS={ 'NR>1 {for(i=1;i<=NF;i++)
                         {if($i~/"1a":/) printf "%s", $i OFS
                          if($i~/"h":"log(A|B)"/) printf "%s\n", $i}}' file


"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"

当然最好使用json感知工具。

答案 3 :(得分:0)

<强>输入

$ cat log
    {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}

<强>输出

$ awk -F, -v RS='[{}]' '{s=""; for(i=1;i<=NF;i++)if($i~/^"(1a|h)":/)s=(s?s OFS:"") $i; if(s)print s}'  log 
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"