我有一个.seg文件,它保存音频文件的二次化后形成的簇的数据。 该文件包含以下数据:
;; cluster S0 [ score:FS = -32.694324625945725 ] [ score:FT =
-33.32942628147711 ] [ score:MS = -32.847416329096404 ] [ score:MT =
-33.45196981196905 ]
ElonN 1 0 758 F S U S0
;; cluster S1 [ score:FS = -33.14490351155562 ] [ score:FT =
-33.420111126893076 ] [ score:MS = -32.29039025858266 ] [ score:MT =
-32.85038927851203 ]
ElonN 1 758 308 M S U S1
ElonN 1 1110 700 M S U S1
ElonN 1 1887 2794 M S U S1
ElonN 1 4849 1190 M S U S1
;; cluster S10 [ score:FS = -34.466969784129404 ] [ score:FT =
-34.951981832991414 ] [ score:MS = -34.83408030011385 ] [ score:MT =
-35.17326803680231 ]
ElonN 1 6731 352 F S U S10
;; cluster S11 [ score:FS = -33.57333115273301 ] [ score:FT =
-33.93961876513661 ] [ score:MS = -32.6529742867516 ] [ score:MT =
-33.397218081762475 ]
ElonN 1 7459 2542 M S U S11
;; cluster S16 [ score:FS = -33.29482735979043 ] [ score:FT =
-33.687616298740195 ] [ score:MS = -32.189984103971135 ] [ score:MT =
-33.13899965310298 ]
ElonN 1 10001 3051 M S U S16
ElonN 1 13086 912 M S U S16
;; cluster S9 [ score:FS = -33.4457701986847 ] [ score:FT =
-34.70059869569136 ] [ score:MS = -33.958162156208914 ] [ score:MT =
-34.79598011488008 ]
ElonN 1 6039 692 F S U S9
我必须提取开始时间(第3列),发言时间(第4列)和最后一列(发言人姓名)。
在以下段中
ElonN 1 6039 692 F S U S9
6039是该细分受众群的开始时间。 692是段的持续时间。 S9是演讲者名称。
我写的以下shell脚本提取整个段并存储在一个文件中。
echo "Enter audio file name. (File must be of .wav format)"
read fileName
echo "Enter path of the audio file"
read path
echo "Enter folder name"
read outputfolder
mkdir -p $outputfolder
echo "Processing $fileName"
./ilp_diarization2.sh $path/$fileName.wav 120 $outputfolder
grep "$fileName.*S" $outputfolder/$fileName/$fileName.g.3.seg > a
cat a
答案 0 :(得分:2)
您可以使用wak等:
var=$(awk '{ print $3" "$4" "$NF }' filename)
或
awk '{ print $3" "$4" "$NF }' filename > outputfile
$ number是指您关注的空格分隔(awk的默认)数据。