我正在尝试解析我存储在本地工作站上的电子邮件文件。每个文件都包含一个硬件订单列表。某些文件可能包含从处理器开始的块中的多个硬件列表:以ExtraIp:结束。如果电子邮件只包含一个块,那么我当前的脚本可以正常运行。如上所述,当电子邮件文件包含多个数据块时会出现问题。
示例问题电子邮件:
Processor: Intel Xeon E3-1270 V2 3.5GHZ, Quad Core
RAM: 16GB DDR3 SDRAM
HD1: 2 x SATA Hardware RAID 1 (7,200 rpm)
(+1TB 7200 RPM SATA hard drive)
SSD: No SSD Drive
HD2: SATA Backup Drive
(+1 TB SATA (7,200 rpm))
HD3: No Additional Storage Array
ExtraIp: Public IP Addresses
Processor: Intel Xeon E3-1220 V2 3.1GHZ, Quad Core
RAM: 8GB DDR3 SDRAM
HD1: 2 x SATA Hardware RAID 1 (7,200 rpm)
(+1TB 7200 RPM SATA hard drive)
SSD: No SSD Drive
HD2: No Backup Drive
HD3: No Additional Storage Array
ExtraIp: Public IP Addresses
我的剧本:
#!/bin/bash
find ./email -print0 | while read -d $'\0' file
do
#### Sed and while loop here, with modification to the below lines to read data from the while loop instead of directly from each file ####
#### Example sed command: sed -n "/Processor:/,/ExtraIp:/p" $file ####
order_date=$(echo $file | awk '{print $11}')
grep "Processor:" "$file" | cut -d : -f2 | cut -d , -f1 | while read cpu_type
do
if [ "$cpu_type" != "" ]; then
echo $order_date
echo $cpu_type
ram_size=$(grep "RAM:" "$file" | cut -d : -f2)
if [ "$ram_size" != "" ]; then
echo $ram_size
fi
hd1_type=$(grep "HD1:" "$file" | cut -d : -f2)
if [ "$hd1_type" != "" ]; then
echo $hd1_type
fi
hd1_size=$(grep -A1 "HD1:" "$file" | tail -n1)
if [ "$hd1_size" != "" ]; then
echo $hd1_size
fi
ssd_type=$(grep "SSD:" "$file" | cut -d : -f2)
ssd_type1=$(grep "SSD:" "$file" | cut -d : -f2 | awk '{print $1}')
if [ "$ssd_type" != "" ]; then
echo $ssd_type
fi
if [[ "$ssd_type1" != "No" && "$ssd_type1" != "" ]]; then
ssd_size=$(grep -A1 "SSD:" "$file" | tail -n1)
echo $ssd_size
else
ssd_size="No SSD"
echo $ssd_size
fi
hd2_type=$(grep "HD2:" "$file" | cut -d : -f2)
hd2_type1=$(grep "HD2:" "$file" | cut -d : -f2 | awk '{print $1}')
if [ "$hd2_type" != "" ]; then
echo $hd2_type
fi
if [[ "$hd2_type1" != "No" && "$hd2_type1" != "" ]]; then
hd2_size=$(grep -A1 "HD2:" "$file" | tail -n1)
echo $hd2_size
else
hd2_size="No HD2"
echo $hd2_size
fi
hd3_type=$(grep "HD3:" "$file" | cut -d : -f2)
hd3_type1=$(grep "HD3:" "$file" | cut -d : -f2 | awk '{print $1}')
if [ "$hd3_type" != "" ]; then
echo $hd3_type
fi
if [[ "$hd3_type1" != "No" && "$hd3_type1" != "" ]]; then
hd3_size=$(grep -A1 "HD3:" "$file" | tail -n1)
echo $hd3_size
else
hd3_size="No HD3"
echo $hd3_size
fi
echo "$order_date,$cpu_type,$ram_size,$hd1_type,$hd1_size,$hd2_type,$hd2_size,$hd3_type,$hd3_size" >> order_list.csv
fi
done
done
预期产出:
如果电子邮件只包含一个文本块,我会得到正确的输出:
2014-04-01,Intel Xeon E3-1270 V2 3.5GHZ, 16GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm),(+1TB 7200 RPM SATA hard drive), SATA Backup Drive,(+1 TB SATA (7,200 rpm)), No Additional Storage Array,No HD3
如果电子邮件包含多个文本块,我会得到以下输出:
2014-04-01,Intel Xeon E3-1270 V2 3.5GHZ, 16GB DDR3 SDRAM
8GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm)
2 x SATA Hardware RAID 1 (7,200 rpm), (+1TB 7200 RPM SATA hard drive), SATA Backup Drive
No Backup Drive, HD3: No Additional Storage Array, No Additional Storage Array
No Additional Storage Array, ExtraIp: Public IP Addresses
2014-04-01,Intel Xeon E3-1220 V2 3.1GHZ, 16GB DDR3 SDRAM
8GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm)
2 x SATA Hardware RAID 1 (7,200 rpm), (+1TB 7200 RPM SATA hard drive), SATA Backup Drive
No Backup Drive, HD3: No Additional Storage Array, No Additional Storage Array
No Additional Storage Array, ExtraIp: Public IP Addresses
在第二个输出中,每个CSV值(内存和驱动器)都会复制来自两个文本块的数据。我的计划是从sed命令中包含另一个while循环(放在我脚本中上面注释的空格中),然后修改每个命令以从while循环中读取数据。
使用示例sed命令:
sed -n "/Processor:/,/ExtraIp:/p" $file
答案 0 :(得分:1)
您的解析脚本使用grep
提取一个字段,当$file
包含两个相同的字段时,grep
会同时提取它们。
你最好重构在Awk中进行所有解析。我不打算为你完成它,但这应该是一个好的开始。
awk 'BEGIN { split("Processor:RAM:HD1:SSD:HD2:HD3", f, /:/) }
/^Processor:/ { delete a } # forget any prevous record
/^(Processor|RAM|HD[123]|SSD):/ { i=$1; sub(/:/,"",i);
$1=""; sub(/^ /,""); a[i]=$0 }
i ~ /^(HD[123]|SSD)$/ && $1 == "No" { a[i] = "No " i; i=""; next }
i ~ /^(HD[123]|SSD)$/ && !k { k=i; next } # remember key for two-line entry
k { a[k] = a[k] "," $0; k=i="" }
/^ExtraIp: / {s=""; for (i=1; i<=length(f); i++) {
printf("%s%s", s, a[f[i]]); s="," } printf "\n" }' "$file"