通过终端中的列将任意输出转换为json?

时间:2016-10-25 17:29:38

标签: json linux bash command-line-interface

我希望能够将任何命令行程序的输出传递给将其转换为json的命令。

例如,我的未知程序可以接受目标列,分隔符和输出字段名称

# select columns 1 and 3 from the output and convert it to simple json
netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other

并会产生类似的东西:

[ 
  {"name": "tcp4", "other": "31"},
  {"name": "tcp4", "other": "0"} 
...
]

我正在寻找适用于任何程序的东西,而不仅仅是netstat

我愿意安装任何第三方工具/开源项目,并倾向于在linux / osx上运行 - 不必是bash脚本解决方案,可以用node,perl,python等编写。 / H3>

编辑:我当然愿意传递任何需要使其工作的信息,例如分隔符或多个分隔符 - 我只是想避免在命令行中进行显式解析,并且让工具做到这一点。

6 个答案:

答案 0 :(得分:5)

过滤 STDIN 以构建 json 变量

简介

由于终端是一种非常特殊的界面,使用等宽字体, 为了监控这个终端而构建的工具,很多输出都可以 难以解析:

netstat输出是一个很好的示例:

Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ACC ]     STREAM     LISTENING     13947569 @/tmp/.X11-unix/X1
unix  2      [ ]         DGRAM                    8760     /run/systemd/notify
unix  2      [ ACC ]     SEQPACKET  LISTENING     8790     /run/udev/control

如果某些行包含空白字段,则无法在空格中进行简单分割。

因此,请求文件脚本convert_to_json将在此底部发布。

使用awk

进行简单的基于空格的拆分

使用awk,您可以使用不错的语法:

netstat -an |
    awk '/CLOSE_WAIT/{
        printf "  { \42%s\42:\42%s\42,\42%s\42:\42%s\42},\n","name",$1,"other",$3
    }' |
    sed '1s/^/[\n/;$s/,$/\n]/'

使用perl进行简单的基于空格的分割,但使用json库

但这种方式更灵活:

netstat -an | perl -MJSON::XS -ne 'push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;END{print encode_json(\@out)."\n";}'

或相同但分裂;

netstat -an |
    perl -MJSON::XS -ne '
        push @out,{"name"=>,$1,"other"=>$2} if
                /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
        END{print encode_json(\@out)."\n";
}'

漂亮的

netstat -an | perl -MJSON::XS -ne '
    push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
    END{$coder = JSON::XS->new->ascii->pretty->allow_nonref;
        print $coder->encode(\@out);}'

最后,我喜欢这个版本不基于

netstat -an | perl -MJSON::XS -ne '
    do {
        my @line=split(/\s+/);
        push @out,{"name"=>,$line[0],"other"=>$line[2]}
    } if /CLOSE_WAIT/;
    END{
        $coder = JSON::XS->new->ascii->pretty->allow_nonref;
        print $coder->encode(\@out);
    }'

但你可以在perl脚本中运行命令:

perl -MJSON::XS -e '
    open STDIN,"netstat -an|";
    my @out;
    while (<>){
        push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
    };
    print encode_json \@out;'

这可能成为一个基本的原型:

#!/usr/bin/perl -w

use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;

$ENV{'LANG'}='C';
open STDIN,"netstat -naut|";
my @out;
my @fields;

my $searchre=":";
$searchre = shift @ARGV if @ARGV;

while (<>){
    map { s/_/ /g;push @fields,$_; } split(/\s+/) if
        /^Proto.*State/ && s/\sAddr/_Addr/g;
    do {
        my @line=split(/\s+/);
        my %entry;
        for my $i (0..$#fields) {
            $entry{$fields[$i]}=$line[$i];
        };
        push @out,\%entry;
    } if /$searchre/;
}

print $coder->encode(\@out);

(如果没有参数,这将转储整个netstat -uta,但您可以将任何搜索字符串作为参数,例如 CLOSE 或IP。)

位置参数,netstat2json.pl

此方法可以与netcat以外的许多其他工具一起使用 更正:

#!/usr/bin/perl -w
use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
$ENV{'LANG'}='C';
open STDIN,"netstat -nap|";
my ( $searchre ,@out,%fields)=( "[/:]" );
$searchre = shift @ARGV if @ARGV;
while (<>){
    next if /^Active\s.*\)$/;
    /^Proto.*State/ && do {
        s/\s(name|Addr)/_$1/g;
        my @head;
        map { s/_/ /g;push @head,$_; } split(/\s+/);
        s/_/ /g;
        %fields=();
        for my $i (0..$#head) {
            my $crt=index($_,$head[$i]);
            my $next=-1;
            $next=index($_,$head[$i+1])-$crt-1 if $i < $#head;
            $fields{$head[$i]}=[$crt,$next];
        }
        next;
    };
    do {
        my $line=$_;
        my %entry;
        for my $i (keys %fields) {
            my $crt=substr($line,$fields{$i}[0],$fields{$i}[1]);
            $crt=~s/^\s*(\S(|.*\S))\s*$/$1/;
            $entry{$i}=$crt;
        };
        push @out,\%entry;
    } if /$searchre/;
}
print $coder->encode(\@out);
  • 查找标题行 Proto.*State (特定于netcat
  • 存储位置和长度的字段名
  • 按字段长度分割,然后修剪空格
  • 将变量转换为json字符串。

这可以使用参数运行,如前所述:

./netstat2json.pl CLOS
[
   {
      "Local Address" : "127.0.0.1:31001",
      "State" : "CLOSE_WAIT",
      "Recv-Q" : "18",
      "Proto" : "tcp",
      "Send-Q" : "0",
      "Foreign Address" : "127.0.0.1:55938",
      "PID/Program name" : "-"
   },
   {
      "Recv-Q" : "1",
      "Local Address" : "::1:53816",
      "State" : "CLOSE_WAIT",
      "Send-Q" : "0",
      "PID/Program name" : "-",
      "Foreign Address" : "::1:631",
      "Proto" : "tcp6"
   }
]

空场不会破坏变量分配:

./netstat2json.pl 1000.*systemd/notify
[
   {
      "Proto" : "unix",
      "I-Node" : "33378",
      "RefCnt" : "2",
      "Path" : "/run/user/1000/systemd/notify",
      "PID/Program name" : "-",
      "Type" : "DGRAM",
      "Flags" : "[ ]",
      "State" : ""
   }
]

Nota! 此修改后的版本使用netstat参数运行-nap以获取 PID/Program name 字段

如果没有超级用户 root 运行,您可以在 STDERR 上成为此输出:

(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)

你可以避免它们

  • 运行netstat2json.pl 2>/dev/null
  • 将此作为 root sudo
  • 运行
  • 修改专栏#6,为"netstat -nap|"更改"netstat -na|"

convert_to_json.pl 脚本将 STDIN 转换为json。

convert_to_json.pl perl脚本严格按照要求运行:以netstat -an | grep CLOSE | ./convert_to_json.pl 1,3 name,other

运行
#!/usr/bin/perl -w

use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;

my (@fields,@pos,@out);

map {
    push @pos,1*$_-1
} split ",",shift @ARGV;      

map { 
    push @fields,$_
} split ",",shift @ARGV;

die "Number of fields don't match number of positions" if $#fields ne $#pos;

while (<>) {
    my @line=split(/\s+/);
    my %entry;
    for my $i (0..$#fields) {
         $entry{$fields[$i]}=$line[$pos[$i]];
    };
    push @out,\%entry;
}
print $coder->encode(\@out);

答案 1 :(得分:4)

这是我的红宝石版本:

#! /usr/bin/env ruby
#
# Converts stdin columns to a JSON array of hashes
#
# Installation : Save as convert_to_json, make it executable and put it somewhere in PATH. Ruby must be installed
#
# Examples :
#
# netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
# ls -l | convert_to_json
# ls -l | convert_to_json 6,7,8,9
# ls -l | convert_to_json 6,7,8,9 month,day,time,name
# convert_to_json 1,2 time,value ";" < some_file.csv
#
#
# http://stackoverflow.com/questions/40246134/convert-arbitrary-output-to-json-by-column-in-the-terminal

require 'json'

script_name = File.basename(__FILE__)
syntax = "Syntax : command_which_outputs_columns | #{script_name} column1_id,column2_id,...,columnN_id column1_name,column2_name,...,columnN_name delimiter"


if $stdin.tty? or $stdin.closed? then
  $stderr.puts syntax
else
  if ARGV[2]
    delimiter = ARGV[2]
    $stderr.puts "#{script_name} : Using #{delimiter} as delimiter"
  else
    delimiter = /\s+/
  end

  column_ids = (ARGV[0] || "").split(',').map{|column_id| column_id.to_i-1}
  column_names = (ARGV[1] || "").split(',')

  results = []
  $stdin.each do |stdin_line|
    if column_ids.empty?
      values = stdin_line.strip.split(delimiter)
    else
      values = stdin_line.strip.split(delimiter).values_at(*column_ids)
    end
    line_hash=Hash.new
    values.each_with_index.each{|value,i|
      colum_name = column_names[i] || "column#{(column_ids[i] || i)+1}"
      line_hash[colum_name]=value
    }
    results<<line_hash
  end
  puts JSON.pretty_generate(results)
end

它的工作原理如下所示:

netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
[
  {
    "name": "tcp",
    "other": "0"
  },
  {
    "name": "tcp6",
    "other": "0"
  }
]

作为奖励,你可以

  • 省略指定参数:每列将转换为json
  • 省略指定名称:列将被称为column1,column2,...
  • 选择缺少的列:value将为null
  • 将分隔符定义为第三个参数。默认为空格

其他例子:

netstat -a | grep CLOSE_WAIT | ./convert_to_json
# [
#   {
#     "column1": "tcp",
#     "column2": "1",
#     "column3": "0",
#     "column4": "10.0.2.15:51074",
#     "column5": "123.45.101.207:https",
#     "column6": "CLOSE_WAIT"
#   },
#   {
#     "column1": "tcp6",
#     "column2": "1",
#     "column3": "0",
#     "column4": "ip6-localhost:50293",
#     "column5": "ip6-localhost:ipp",
#     "column6": "CLOSE_WAIT"
#   }
# ]

netstat -a | grep CLOSE_WAIT | ./convert_to_json 1,3
# [
#   {
#     "column1": "tcp",
#     "column3": "0"
#   },
#   {
#     "column1": "tcp6",
#     "column3": "0"
#   }
# ]

ls -l | tail -n3 | convert_to_json 6,7,8,9 month,day,time,name
# [
#   {
#     "month": "Oct",
#     "day": "27",
#     "time": "10:35",
#     "name": "test.dot"
#   },
#   {
#     "month": "Nov",
#     "day": "2",
#     "time": "14:27",
#     "name": "uniq.rb"
#   },
#   {
#     "month": "Nov",
#     "day": "2",
#     "time": "14:27",
#     "name": "utf8_nokogiri.rb"
#   }
# ]

# NOTE: ls -l uses the 8th column for year, not time, for older files :
ls --full-time -t /usr/share/doc | tail -n3 | ./convert_to_json 6,7,9 yyyymmdd,time,name
[
  {
    "yyyymmdd": "2013-10-21",
    "time": "15:15:20.000000000",
    "name": "libbz2-dev"
  },
  {
    "yyyymmdd": "2013-10-10",
    "time": "16:27:32.000000000",
    "name": "zsh"
  },
  {
    "yyyymmdd": "2013-10-03",
    "time": "18:52:45.000000000",
    "name": "manpages-dev"
  }
]

ls -l | tail -n3 | convert_to_json 9,12
# [
#   {
#     "column9": "test.dot",
#     "column12": null
#   },
#   {
#     "column9": "uniq.rb",
#     "column12": null
#   },
#   {
#     "column9": "utf8_nokogiri.rb",
#     "column12": null
#   }
# ]

convert_to_json 1,2 time,value ";" < some_file.csv
# convert_to_json : Using ; as delimiter
# [
#   {
#     "time": "1",
#     "value": "3"
#   },
#   {
#     "time": "2",
#     "value": "5"
#   }
# ]

答案 2 :(得分:3)

我发现a great list of tools可以使用命令行输出,列出的工具之一是sqawk,它会将任意数据转换为json,让你使用类似查询的sql过滤它!

ps输出转换为JSON

ps | sqawk -output json,indent=1 'select PID,TTY,TIME,CMD from a' trim=left header=1

输出

[{
    "PID"  : "3947",
    "TTY"  : "pts/2",
    "TIME" : "00:00:07",
    "CMD"  : "zsh"
},{
    "PID"  : "15951",
    "TTY"  : "pts/2",
    "TIME" : "00:00:00",
    "CMD"
}]

答案 3 :(得分:2)

这个方法的一个非常基本的概念并非完全严格或功能齐全,但可能会让您了解如何使用netstat -a | grep CLOSE_WAIT | awk 'BEGIN{print "["} {print " {\"name\": \"",$1,"\", \"other\": \"",$2,"\"}"} END{print "]"}' OFS="" [ {"name": "tcp4", "other": "31"} {"name": "tcp4", "other": "31"} {"name": "tcp4", "other": "31"} {"name": "tcp4", "other": "31"} {"name": "tcp4", "other": "0"} {"name": "tcp4", "other": "31"} ] 完成大部分操作,因为您似乎没有还有!

awk

我知道它不会在行末端做逗号,我知道它不带参数 - 但这两个都是可以解决的。

参数的一个想法是将它们传递给awk -v fields="1:4:7" -v headings="name:other:fred" '{...}' ,如下所示:

split()

然后BEGIN echo hi | awk -v fields="1:3:5" -v headings="HeadA:HeadB:HeadC" 'BEGIN{split(headings,h,":"); split(fields,f,":")} {for(i in h)print h[i],f[i];}' HeadA 1 HeadB 3 HeadC 5 部分中的那些并在主循环中迭代它们。这看起来像这样:

{{1}}

答案 4 :(得分:2)

这是我的python版本:

#!/usr/bin/env python3

import json
import re

def all_columns_to_json (column_dict, columns_line):    
    json_object = {}    

    for column_index, column_value in enumerate(columns_line):
        if column_index in column_dict:
            column_name = column_dict[column_index]
        else:
            column_name = str(column_index)

        json_object[column_name] = column_value

    return json_object


def filter_columns_in_dict_to_json(column_dict, columns_line):
    '''Parse columns_line, make sure every element in column_dict
       exists there, filter elements that are not in column_dict from 
       columns_line, and convert it to a dict.
    '''
    json_object = {}    

    for column_index, column_name in column_dict.items():
        try:
            json_object[column_name] = columns_line[column_index]
        except IndexError as err:
            # columns_line doesn't has column_index.

            raise ValueError('Invalid table line ({}) : no {} element.'.format(columns_line,
                                                                               column_index)) from err     

    return json_object

def columns_line_to_json (column_dict, columns_line, should_filter_colunms):
    '''Parse a list of values to a json object with special names.
    '''

    if should_filter_colunms:
        return filter_columns_in_dict_to_json(column_dict, columns_line)
    else:
        return all_columns_to_json(column_dict, columns_line)

def regex_from_delims_list(delims_list):
    '''Get a regex compiled pattern from a delims list'''    

    one_characters_delims = ''
    final_pattern = ''

    for delim in delims_list:
        delim_and_maybe_min_max = delim.split(':')

        escaped_delim = re.escape(delim_and_maybe_min_max[0])

        # Check if this is a delim without min count.
        if len(delim) == 1:
            final_pattern += "%s{1,}|" % (escaped_delim)
        elif len(delim) == 2:
            min_and_maybe_max = delim_and_maybe_min_max[1].split('-')

            current_pattern = escaped_delim

            # Add count to the regex (only min or max too)
            if len(min_and_maybe_max) == 2:
                current_pattern += '{%d,%d}' % (int(min_and_maybe_max[0],
                                                int(min_and_maybe_max[1])))
            else:
                current_pattern += '{%d,}' % (int(min_and_maybe_max[0]))

            final_pattern += current_pattern + '|'
        else:
            raise ValueError("Invalid ':' count in the delimiter argument")

        # If there are one character delims without count, add them. If not
        # Remove the last OR ('|').

        final_pattern = final_pattern[:-1]

        return re.compile (final_pattern)


def main(args):
    column_dict = {}    

    # Split the user's argument by a comma, and parse each columns
    # seperatly.
    for column_and_name in args.columns_and_names.split(','):
        # Split the name from the columns.
        column_and_name = column_and_name.split('=')
        if len(column_and_name) > 2:
            raise ValueError("Invalid column: {}".format(str(column_and_name())))

        # If there is not name, set it to the column index.
        if len(column_and_name) == 1:
            column_and_name.append (str(column_and_name[0]))

        # Try to convert the column index is it isn't '*'
        if column_and_name[0] != '*':
            try:
                column_and_name[0] = int(column_and_name[0])
            except ValueError as err:
                raise ValueError('Invalid column index: {} (not an integer)'.format(column_and_name[0])) from err

        # Add this column definition. 
        column_dict[column_and_name[0]] = column_and_name[1]


    # Check if column_dict has the '*' member.
    # If it does, we will print all of the columns (even ones that
    # are not in column_dict)
    should_filter_colunms = ('*' not in column_dict)

    # We have checked it, no need for it now.
    if not should_filter_colunms:
        del column_dict['*']

    # Parse the delim list into a regex pattern.
    strip_regex_pattern = regex_from_delims_list(args.delimiters)

    json_objects_list = []    

    for fd in args.infiles:
        for line in fd:
            # Convert bytes object to string.
            if isinstance(line, bytes): 
                line = line.decode('utf-8')

            # Strip the \n in the end of the line.
            line = line.rstrip('\n')            

            # Split the line by the delims.
            splitted_line = re.split(strip_regex_pattern, line)

            json_objects_list.append (columns_line_to_json (column_dict, splitted_line, should_filter_colunms))

    print(json.dumps (json_objects_list))


def comma_list(string):
    '''Convert a comma list '1,2,3,4' to a list
    [1,2,3,4] with escaping of , by a one \\ char'''

    # Split the string by commas after non-\ chars.
    splitted_string = re.split('(?!\\\).,', re.escape(string))

    replaced_string = []    

    # Replace '\,' with ',' and '\\' with '\'.
    for string in splitted_string:
        string = string.replace ('\\\\', '\\')
        string = string.replace ('\\\\,', ',')

        replaced_string.append (string)    

    return replaced_string

if __name__ == '__main__':
    import argparse    
    from sys import stdin

    parser = argparse.ArgumentParser()
    parser.add_argument('columns_and_names', help='The columns and its names to print out (format: n=name)', default='*')
    parser.add_argument('--delim', '-d', type=comma_list, 
                        help='A list of input columns delimiters. Format: delim[:min[-max]]. Where `min` and `max` are the numbers of times `delim` should repeat. As default min=1 and max is not set. Enter "\," for the delimiter "," and "\\\\"" for "\\"',
                        default=(' ', '\t'), 
                        metavar='delim[:min-max]')
    parser.add_argument('infiles', type=argparse.FileType('rb'), default=(stdin,), metavar='file', nargs='*')

    main(parser.parse_args())

(有关更多用法示例,请参阅https://github.com/Reflexe/convert_table_to_json

我试图寻找类似的程序,但我找不到任何东西,所以我不得不写它(我认为这是一个非常有用的工具)。

例如,要将其与netstat一起使用,请使用:

$ netstat -a | grep ESTABLISHED |  ./convert_to_json.py  '2=name,3=other'

答案 5 :(得分:0)

可能BASH不是最好的平台。但是,我提供了一个半生不熟的解决方案,你需要一些技巧。

#!/bin/bash

function procline {
    IFS=' ' list=($1)
    echo -n "{ first_column: \"" ${list[0]} "\","
    echo "{ second_column: \"" ${list[1]} "\" }},"
}

tr -s " " | eval \
    'while IFS= read -r line; do procline "$line"; done'

一些解释:

  • tr截断空格
  • while IFS ...将结果逐行传递给procline函数
  • procline函数首先拆分该行,然后创建有点JSON。

我的意见是,tr的输出应传递给另一种用其他语言编写的脚本,例如: Python,PHP-CLI等,如果不是全部。它看起来很容易处理:

tcp4 0 0 192.168.99.1.56358 192.168.99.100.32111 CLOSE_WAIT
tcp4 31 0 192.168.100.179.56129 server-54-192-20.https CLOSE_WAIT