将数据从简单的JSON格式转换为DSV格式

时间:2014-12-07 15:55:26

标签: bash shell unix awk sed

我在Unix中有一个文件,其数据样本如下:

{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}

所需的输出是

ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexico
456|Americas|Canada
567|APAC|Japan

我尝试了几个sed命令。我可以删除以下内容:'{','}','“',':'

输出文件有两个问题

  1. 输入中的所有行在输出中以单行显示。
  2. 添加管道('|')作为分隔符。
  3. 任何指针都受到高度赞赏。

6 个答案:

答案 0 :(得分:1)

通过awk,

awk -F'"' -v OFS="|" 'BEGIN{print "ID|Region|Location"}{print $4,$8,$12}' file

示例:

$ cat file
{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}
$ awk -F'"' -v OFS="|" 'BEGIN{print "ID|Region|Location"}{print $4,$8,$12}' file
ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan

<强>解释

  • -F'"'"设置为字段分隔符值。
  • OFS="|"|设置为输出字段分隔符值。
  • 首先,awk将执行BEGIN块内的函数。它有助于打印标题部分。

答案 1 :(得分:1)

这个sed单行做你想要的。它使用带括号的表达式捕获字段值,然后使用\ 1,\ 2和\ 3将它们放入输出中。

s/^{"ID":"\([^"]*\)", "Region":"\([^"]*\)", "Location":"\([^"]*\)"}$/\1|\2|\3/

像以下一样调用它:

$ sed -f one-liner.sed input.txt

或者您可以在Bash脚本中调用它,生成标题:

echo 'ID|Region|Location'
sed -e 's/^{"ID":"\([^"]*\)", "Region":"\([^"]*\)", "Location":"\([^"]*\)"}$/\1|\2|\3/' $input

答案 2 :(得分:1)

我推荐工具jqhttp://stedolan.github.io/jq/); jq是一个轻量级且灵活的命令行JSON处理器。

jq -r '"\(.ID)|\(.Region)|\(.Location)"' < infile

123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan

解释

  • -r--raw-output

答案 3 :(得分:0)

它是一个JSON文件,因此最好使用JSON解析器。这是perl的实现。

#!/usr/bin/perl

use strict;
use warnings;
use JSON;

open my $fh, '<', 'path/to/your/file';

#keys of your structure
my @key = qw(ID Region Location);

print join ("|", @key), "\n";

#iterate over your file, decode it and print in order of your key structure
while (my $json = <$fh>) {
    my $text = decode_json($json); 
    print join ("|", map { $$text{$_} } @key ),"\n";
}

输出:

ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan

答案 4 :(得分:0)

使用sed如下

命令行

echo "my_string" |
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g"  -e 's#Location##g' \
    -e '1 s#^.*$#ID Region Location\n&#'  -e 's# #|#g'

sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
    -e '1 s#^.*$#ID Region Location\n&#'  -e 's# #|#g' my_file

我在终端上尝试了如下:

echo '{"ID":"123", "Region":"Asia", "Location":"India"}
{"ID":"234", "Region":"APAC", "Location":"Australia"}
{"ID":"345", "Region":"Americas", "Location":"Mexio"}
{"ID":"456", "Region":"Americas", "Location":"Canada"}
{"ID":"567", "Region":"APAC", "Location":"Japan"}' |
sed -e 's#[,:"{}]##g' -e 's#ID##g' -e "s#Region##g" -e 's#Location##g' \
    -e '1 s#^.*$#ID Region Location\n&#'  -e 's# #|#g'

输出

ID|Region|Location
123|Asia|India
234|APAC|Australia
345|Americas|Mexio
456|Americas|Canada
567|APAC|Japan

答案 5 :(得分:0)

非常感谢您的回复,指针/解决方案确实提供了很多帮助。 出于一些神秘的原因,我无法获得任何sed命令。所以,我设计了自己的解决方案。尽管它并不优雅,但它仍然有效。 这是我准备的脚本解决了这个问题。

#!/bin/bash

# ource file path.
infile=/home/exfile.txt

# remove if these temp file exist already.
rm ./efile.txt ./xfile.txt ./yfile.txt ./zfile.txt

# removing the curly braces from input file.
cat exfile.txt | cut -d "{" -f2 | cut -d "}" -f1 >> ./efile.txt

# setting input file name to different value.
infile=./efile.txt

# remove double quotes from the file.
while IFS= read -r line
do
    echo $line | sed 's/\"//g' >> ./xfile.txt

done < "$infile"

# creating another temp file.
infile2=./xfile.txt


# remove colon from file.
while IFS= read -r line
do
    echo $line | sed 's/\:/,/g' >> ./yfile.txt
done < "$infile2"

# set input file path to new temp file.
infile3=yfile.txt

# initialize variables to hold header column values.
t1=0
t3=0
t5=0


# read each of the line to extract header row. Exit loop after reading 1st row.
once=1
while IFS=',' read -r f1 f2 f3 f4 f5 f6 
do
    "$f1 $f2 $f3 $f4 $f5 $f6"
    t1=$f1
    t3=$f3
    t5=$f5

    if [ "$once" -eq 1 ]; then
        break
    fi
 done < "$infile3"

# Read each of the line from input file. Write only the value to another output file.
while IFS=',' read -r f1 f2 f3 f4 f5 f6
do
    echo "$f2|$f4|$f6" >> ./zfile.txt

done < "$infile3"

# insert the header column row into the file generated in the step above.
frstline="$t1|$t3|$t5"
sed -i '1i ID|Region|Location' ./zfile.txt