Question

我有一个包含这种结构的文本文件：

....

"/home/letizia/Documents/SpanishSegmentation/Recordings/segmented/mfc/F001.0.rec"
COGE
LAS
HOJAS
Y
LAS
QUEMAS
TODAS
EN
EL
FUEGO
"/home/letizia/Documents/SpanishSegmentation/Recordings/segmented/mfc/F002.0.rec"
LA
LIGA
DE
PAZ
SE
REUNIO314201
PARA
TRATAR
EL
TEMA
....

我想选择＆＃34; F0001.0＆＃34;和＆＃34; F0002.0＆＃34;。

我正在使用：

     ID="F"
     if [[ "$LINE" == *Recordings* ]]
     then        
     SEGMENT=`echo $LINE | grep -o $ID.* | cut -d'.' -f1-2`
     fi

但它不起作用。错误在哪里？

非常感谢你。

Answer 1

请尝试使用sed：

sed -n 's@^".*/Recordings/.*/\(.*\)"$@\1@p' file.txt

快速演练：

-n：除非特别要求，否则不要打印任何内容（最终p）。
s@：将该部分替换为下一个@部分直到之后的部分。
^".*/Recordings/.*/$.*$"$：匹配以双引号开头和结尾的行，包含/ Recordings /，并在最后一个斜杠之前吃掉所有内容。
\1：将匹配的字符串替换为最后一部分（我们在括号中捕获）。

Answer 2

您需要while循环：

while IFS= read -r line; do
    id="F"
    if [[ "$line" =~ /Recordings/ ]]; then

        segment=$(echo $line | grep -o "$id.*" | cut -d '.' -f1-2)
        echo "$segment"
    fi
done < file.txt

结果：

F001.0
F002.0

但是，更好的方法是使用sed：

sed -n '/Recordings/s#.*/\(F[^\.]*\.[^\.]*\).*#\1#p' file.txt

从shell脚本中的文件中选择特定单词

2 个答案: