我有一个输入文本文件:
EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL 15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L 15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.H 15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.S 15JAN2016
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD 15JAN2016
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD 15JAN2016
鉴于样本数据达到dot(。)的最大水平,我们需要独特类型的1个代表性样本(完整行),没有日期。所以输出将是
EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD
(输出中行的顺序无关紧要。)
下面的程序工作正常但它会生成许多中间临时文件。我们怎么能在没有壳的情况下做到这一点?
#input file name and path assumed in current directory
file="./osc.txt"
resultfilepath="./OSCoutput.txt"
tmpfilepath="./OSCtempoutput.txt"
tmp1filepath="./OSCtemp1output.txt"
tmp2filepath="./OSCtemp2output.txt"
rm $resultfilepath
rm $tmpfilepath
#using awk to filter only series data without dates
awk -F' ' '{print $1}' $file >> $tmpfilepath
#getting all the unique dataclass_names at column 1
DATACLASSNAME=(`cut -f 1 -d'.' $tmpfilepath | sort | uniq`)
for i in "${DATACLASSNAME[@]}"; do
rm $tmp1filepath
#we are filtering the file with that dataclass
awk -F'.' -v awk_dataclassname="$i" '$1==awk_dataclassname' $tmpfilepath >> $tmp1filepath
#also we are calculating the number of delimeter in filtered record and sorting it
COLCOUNT=(`awk -F'.' '{print NF}' $tmp1filepath | uniq | sort`)
for j in "${COLCOUNT[@]}"; do
rm $tmp2filepath
#in the filtered data we are taking series of a particular dimension length and dumping data
awk -F '.' -v awk_colcount="$j" '(NF==awk_colcount){print}' $tmp1filepath >> $tmp2filepath
#reducing column no by 1
newj=$(echo $((j - 1)))
#removing last column(generally observation dimension) by cut column
GREPSAMPLE=(`cut -f -$newj -d'.' $tmp2filepath | uniq`)
SAMPLELENGTH=(`wc -l $tmp2filepath`)
#we are now taking unique series sample
for k in "${GREPSAMPLE[@]}"; do
#doing grep of unique sample but taking the whole line
echo `grep $k $tmp1filepath | head -1` >> $resultfilepath
done
done
done
cat $resultfilepath
echo "processing finish"
答案 0 :(得分:7)
只需要awk
调用即可完成整个过程。
awk '{
key = $0;
sub("\\.[^.]*$", "", key); # Let key be everything up to the last dot
if (!seen[key]) { print $1 } # If key has not been seen, print 1st col
seen[key] = 1; # Mark the key as seen
}' "$file" > "$resultfilepath"
一般情况下,如果你的脚本涉及大量的awking和grepping,那么很可能只需编写一个awk脚本。