虽然每个sed上的循环匹配

时间:2014-04-28 17:02:58

标签: sed while-loop

我正在尝试解析我存储在本地工作站上的电子邮件文件。每个文件都包含一个硬件订单列表。某些文件可能包含从处理器开始的块中的多个硬件列表:以ExtraIp:结束。如果电子邮件只包含一个块,那么我当前的脚本可以正常运行。如上所述,当电子邮件文件包含多个数据块时会出现问题。

示例问题电子邮件:

Processor: Intel Xeon E3-1270 V2 3.5GHZ, Quad Core
RAM: 16GB DDR3 SDRAM
HD1: 2 x SATA Hardware RAID 1 (7,200 rpm)
(+1TB 7200 RPM SATA hard drive)
SSD: No SSD Drive
HD2: SATA Backup Drive
(+1 TB SATA (7,200 rpm))
HD3: No Additional Storage Array
ExtraIp: Public IP Addresses

Processor: Intel Xeon E3-1220 V2 3.1GHZ, Quad Core
RAM: 8GB DDR3 SDRAM
HD1: 2 x SATA Hardware RAID 1 (7,200 rpm)
(+1TB 7200 RPM SATA hard drive)
SSD: No SSD Drive
HD2: No Backup Drive
HD3: No Additional Storage Array
ExtraIp: Public IP Addresses

我的剧本:

#!/bin/bash
find ./email -print0 | while read -d $'\0' file
do
#### Sed and while loop here, with modification to the below lines to read data from the while loop instead of directly from each file ####
#### Example sed command: sed -n "/Processor:/,/ExtraIp:/p" $file ####

    order_date=$(echo $file | awk '{print $11}')
    grep "Processor:" "$file" | cut -d : -f2 | cut -d , -f1 | while read cpu_type
    do
            if [ "$cpu_type" != "" ]; then
                    echo $order_date
                    echo $cpu_type
                    ram_size=$(grep "RAM:" "$file" | cut -d : -f2)
                    if [ "$ram_size" != "" ]; then
                            echo $ram_size
                    fi
                    hd1_type=$(grep "HD1:" "$file" | cut -d : -f2)
                    if [ "$hd1_type" != "" ]; then
                            echo $hd1_type
                    fi
                    hd1_size=$(grep -A1 "HD1:" "$file" | tail -n1)
                    if [ "$hd1_size" != "" ]; then
                            echo $hd1_size
                    fi
                    ssd_type=$(grep "SSD:" "$file" | cut -d : -f2)
                    ssd_type1=$(grep "SSD:" "$file" | cut -d : -f2 | awk '{print $1}')
                    if [ "$ssd_type" != "" ]; then
                            echo $ssd_type
                    fi
                    if [[ "$ssd_type1" != "No"  &&  "$ssd_type1" != "" ]]; then
                            ssd_size=$(grep -A1 "SSD:" "$file" | tail -n1)
                            echo $ssd_size
                    else
                            ssd_size="No SSD"
                            echo $ssd_size
                    fi
                    hd2_type=$(grep "HD2:" "$file" | cut -d : -f2)
                    hd2_type1=$(grep "HD2:" "$file" | cut -d : -f2 | awk '{print $1}')
                    if [ "$hd2_type" != "" ]; then
                            echo $hd2_type
                    fi
                    if [[ "$hd2_type1" != "No"  &&  "$hd2_type1" != "" ]]; then
                            hd2_size=$(grep -A1 "HD2:" "$file" | tail -n1)
                            echo $hd2_size
                    else
                            hd2_size="No HD2"
                            echo $hd2_size
                    fi
                    hd3_type=$(grep "HD3:" "$file" | cut -d : -f2)
                    hd3_type1=$(grep "HD3:" "$file" | cut -d : -f2 | awk '{print $1}')
                    if [ "$hd3_type" != "" ]; then
                            echo $hd3_type
                    fi
                    if [[ "$hd3_type1" != "No"  &&  "$hd3_type1" != "" ]]; then
                            hd3_size=$(grep -A1 "HD3:" "$file" | tail -n1)
                            echo $hd3_size
                    else
                            hd3_size="No HD3"
                            echo $hd3_size
                    fi
            echo "$order_date,$cpu_type,$ram_size,$hd1_type,$hd1_size,$hd2_type,$hd2_size,$hd3_type,$hd3_size" >> order_list.csv
            fi
    done
done

预期产出:

如果电子邮件只包含一个文本块,我会得到正确的输出:

2014-04-01,Intel Xeon E3-1270 V2 3.5GHZ, 16GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm),(+1TB 7200 RPM SATA hard drive), SATA Backup Drive,(+1 TB SATA (7,200 rpm)), No Additional Storage Array,No HD3

如果电子邮件包含多个文本块,我会得到以下输出:

2014-04-01,Intel Xeon E3-1270 V2 3.5GHZ, 16GB DDR3 SDRAM
8GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm)
2 x SATA Hardware RAID 1 (7,200 rpm),    (+1TB 7200 RPM SATA hard drive), SATA Backup Drive
No Backup Drive,    HD3: No Additional Storage Array, No Additional Storage Array
No Additional Storage Array,    ExtraIp: Public IP Addresses
2014-04-01,Intel Xeon E3-1220 V2 3.1GHZ, 16GB DDR3 SDRAM
8GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm)
2 x SATA Hardware RAID 1 (7,200 rpm),    (+1TB 7200 RPM SATA hard drive), SATA Backup Drive
No Backup Drive,    HD3: No Additional Storage Array, No Additional Storage Array
No Additional Storage Array,    ExtraIp: Public IP Addresses

在第二个输出中,每个CSV值(内存和驱动器)都会复制来自两个文本块的数据。我的计划是从sed命令中包含另一个while循环(放在我脚本中上面注释的空格中),然后修改每个命令以从while循环中读取数据。

使用示例sed命令:

sed -n "/Processor:/,/ExtraIp:/p" $file

1 个答案:

答案 0 :(得分:1)

您的解析脚本使用grep提取一个字段,当$file包含两个相同的字段时,grep会同时提取它们。

你最好重构在Awk中进行所有解析。我不打算为你完成它,但这应该是一个好的开始。

awk 'BEGIN { split("Processor:RAM:HD1:SSD:HD2:HD3", f, /:/) }
    /^Processor:/ { delete a }  # forget any prevous record
    /^(Processor|RAM|HD[123]|SSD):/ { i=$1; sub(/:/,"",i); 
        $1=""; sub(/^ /,""); a[i]=$0 }
    i ~ /^(HD[123]|SSD)$/ && $1 == "No" { a[i] = "No " i; i=""; next }
    i ~ /^(HD[123]|SSD)$/ && !k { k=i; next }  # remember key for two-line entry
    k { a[k] = a[k] "," $0; k=i="" }
    /^ExtraIp: / {s=""; for (i=1; i<=length(f); i++) {
        printf("%s%s", s, a[f[i]]); s="," } printf "\n" }' "$file"