Question

我有一个输入文本文件：

EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL      15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L       15JAN2016 
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.H       15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.S       15JAN2016 
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD            15JAN2016 
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD            15JAN2016

鉴于样本数据达到dot（。）的最大水平，我们需要独特类型的1个代表性样本（完整行），没有日期。所以输出将是

EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD

（输出中行的顺序无关紧要。）

下面的程序工作正常但它会生成许多中间临时文件。我们怎么能在没有壳的情况下做到这一点？

#input file name and path assumed in current directory
file="./osc.txt"
resultfilepath="./OSCoutput.txt"
tmpfilepath="./OSCtempoutput.txt"
tmp1filepath="./OSCtemp1output.txt"
tmp2filepath="./OSCtemp2output.txt"


rm $resultfilepath
rm $tmpfilepath
#using awk to filter only series data without dates
awk -F' ' '{print $1}' $file >> $tmpfilepath

#getting all the unique dataclass_names at column 1
DATACLASSNAME=(`cut -f 1 -d'.' $tmpfilepath | sort | uniq`)
for i in "${DATACLASSNAME[@]}"; do
rm $tmp1filepath
#we are filtering the file with that dataclass
awk -F'.' -v awk_dataclassname="$i" '$1==awk_dataclassname' $tmpfilepath >> $tmp1filepath
#also we are calculating the number of delimeter in filtered record and sorting it
COLCOUNT=(`awk -F'.' '{print NF}' $tmp1filepath | uniq | sort`)
for j in "${COLCOUNT[@]}"; do
rm $tmp2filepath
#in the filtered data we are taking series of a particular dimension length and dumping data
awk -F '.' -v awk_colcount="$j" '(NF==awk_colcount){print}' $tmp1filepath >> $tmp2filepath
#reducing column no by 1
newj=$(echo $((j - 1)))
#removing last column(generally observation dimension) by cut column
GREPSAMPLE=(`cut -f -$newj -d'.' $tmp2filepath | uniq`)
SAMPLELENGTH=(`wc -l $tmp2filepath`)
#we are now taking unique series sample
for k in "${GREPSAMPLE[@]}"; do
#doing grep of unique sample but taking the whole line
echo `grep $k $tmp1filepath | head -1` >> $resultfilepath

done
done
done
cat $resultfilepath
echo "processing finish"

Answer 1

只需要awk调用即可完成整个过程。

awk '{
    key = $0;
    sub("\\.[^.]*$", "", key);      # Let key be everything up to the last dot

    if (!seen[key]) { print $1 }    # If key has not been seen, print 1st col
    seen[key] = 1;                  # Mark the key as seen
}' "$file" > "$resultfilepath"

一般情况下，如果你的脚本涉及大量的awking和grepping，那么很可能只需编写一个awk脚本。

从shell脚本中删除临时文件，从文件中提取模式

1 个答案: