Shell脚本从文本文件中提取数据

时间:2015-12-04 20:32:36

标签: shell csv awk extract

我制作了一个shell脚本,该脚本应该用某些字段名称提取数据并将它们放在CSV文件中。

示例输入文件可能包含以下行:

                  user_name: null@gmail.com
                      EMAIL: null@gmail.com
                 FIRST_NAME: jonathan
                  LAST_NAME: doestein
              CREATION_DATE: 2013-08-01 01:08:52
        REGISTRATION_STATUS: Y
                     VENDOR: vendorname

这将重复自己' n'次。

这是我到目前为止写的脚本的摘录:

#!/bin/sh

echo "Please enter input file name."
read input_variable
echo "You entered: $input_variable"

echo "Please enter a name of the new output file."
read output_file
touch $output_file
echo "The output file name is going to be $output_file"

echo "Extracting files..."  ;

awk '$1 ~ /^(user_name:|EMAIL:|FIRST_NAME:|LAST_NAME:|CREATION_DATE:|REGISTRATION_STATUS:)$/{printf "%s,",$2} $1 ~ /REGISTRATION_STATUS:/{print $2}' $input_variable >> $output_file.ib ;

但是,虽然数据打印到我的输出文件,该文件必须是.csv扩展名才能查看GUI,但当我在OpenOffice Calc等GUI中打开文件时,同一行中连接的行数很多,而其他线似乎开始像他们应该的新线。

例如,一行可能如下所示:

noway@gmail.com,noreally51,noway,username,username...x40 or so

usnername,username,username ....这意味着它只是在一行中列出了大约40-50个用户名,然后最后转到下一行并打印信息。

我想将列名添加到输出文件中:

VENDOR,user_name,FIRST_NAME,LAST_NAME,CREATION_DATE,REGISTRATION_STATUS

我无法弄清楚如何做到这一点。

感谢您的时间和所有支持!

我编辑了我的脚本如下:

#!/bin/sh

echo "Please enter input file name."
read input_variable
echo "You entered: $input_variable"

echo "Please enter a name of the new output file."
touch output_file
read $output_file
echo "The output file name is going to be $output_file"

echo "Processing data extraction..." ;

awk -F": " n=25 -v 'NR<=n {h[NR-1]=$1} {a[NR%n-1]=$2} $1~/VENDOR/ && !hp{for(k=0;k<n;k++) printf "%s ", h[k] $input_variable && print "";hp=1} $1~/VENDOR/{for(k=0;k<n;k++) printf "%s ", a[k] && print ""}' data | column -t $input_variable ;

echo "Done."

这至少会将数据打印到$ output_file。但是,$ output_file中的数据如下所示:

??ࡱ?;?? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????Root Entry????????????????????????????????????????????????????????????????

@karakfa

这是我所拥有的脚本的内容。我注意到你答案中的脚本第一行更改了。所以,我将我的脚本修改为以下内容:

#!/bin/sh

echo "Please enter input file name."
read input_variable
echo "You entered: $input_variable"

echo "Please enter a name of the new output file."
touch output_file
read $output_file
echo "The output file name is going to be ${output_file}"

echo "Processing data extraction..." ;

cat $input_variable | awk -F": " -v OFS="," -v n=25
  'NR<=n{sub(/^ */,"",$1);h[NR-1]=$1}
        {a[(NR-1)%n]=$2}
$1~/VENDOR/ && !hp{line=h[0];
                  for(k=1;k<n;k++) line=line OFS h[k];
                  print line;hp=1
                 }
      $1~/VENDOR/{line=a[0];
                  for(k=1;k<n;k++) line=line OFS a[k];
                  print line}' $input_variable ;
echo "Done."

输出结果为:

Please enter input file name.
inputfile.txt
You entered: allgmail.com_accounts.txt
Please enter a name of the new output file.
outputfile.csv
The output file name is going to be 
Processing data extraction...
awk: no program given

./scriptname: line 23: NR<=n{sub(/^ */,"",$1);h[NR-1]=$1} 
          {a[(NR-1)%n]=$2} 
  $1~/VENDOR/ && !hp{line=h[0]; 
                    for(k=1;k<n;k++) line=line OFS h[k];
                    print line;hp=1
                   }  
        $1~/VENDOR/{line=a[0];
                    for(k=1;k<n;k++) line=line OFS a[k];
                    print line}: No such file or directory
Done.

我没有找到任何关于&aw; awk的文章:没有给出的程序&#39;错误。你知道我做错了什么吗?

我注意到它所说的第23行&#39;所以第23行如下:

 print line}' $input_variable ;

然后,我注意到它在最后一行也说了以下内容:

print line}: No such file or directory

无论有没有&#39; cat $ input_variable |&#39;在awk之前。通常,awk在我的操作系统上运行正常。它是Mac 10.11.1(15B42)。 #!/ bin / sh不正确吗?

我期待你的想法。谢谢!

2 个答案:

答案 0 :(得分:2)

如果您的所有字段始终存在,则可以尝试以下awk脚本。字段数被设置为变量(在这种情况下为7)和&#34; VENDOR&#34;用作记录指示符的最后一个字段。

更新:没有注意到csv输出

$ awk -F": " -v OFS="," -v n=7 
    'NR<=n{sub(/^ */,"",$1);h[NR-1]=$1} 
          {a[(NR-1)%n]=$2} 
 $1~/VENDOR/ && !hp{line=h[0]; 
                    for(k=1;k<n;k++) line=line OFS h[k];
                    print line;hp=1
                   }  
        $1~/VENDOR/{line=a[0];
                    for(k=1;k<n;k++) line=line OFS a[k];
                    print line}' inputfilename


user_name,EMAIL,FIRST_NAME,LAST_NAME,CREATION_DATE,REGISTRATION_STATUS,VENDOR
null@gmail.com,null@gmail.com,jonathan,doestein,2013-08-01 01:08:52,Y,vendorname

在前n行中构建标题,完成打印标题一次,并在看到最后一个字段时记录每个记录。

要将最后一个字段移到第一个,您可以将代码更改为

line=h[n-1]; 
for(k=1;k<n-1;k++) line=line OFS h[k];

两次出现(将数组名称从&#34; h&#34;更改为&#34; a&#34;在第二个实例中)。

答案 1 :(得分:1)

为什么不在awk之前使用echo?

echo ENDOR,user_name,FIRST_NAME,LAST_NAME,CREATION_DATE,REGISTRATION_STATUS > file