Question

我有以下格式的文件：

B: that


I: White


I: House


B: the
I: emergency


I: rooms


B: trauma
I: centers

我需要做的是从顶部逐行读取，如果行以B开头，则删除B：如果以I开头：则删除I：并连接到上一个（前一个在同一规则中处理）。

预期产出：

that White House
the emergency rooms
trauma centers

我尝试了什么：

while read line
do
    string=$line

    echo $string | grep "B:"  1>/dev/null 
    if [ `echo $?` -eq 0 ] //if start with " B: "
    then
        $newstring= echo ${var:4} //cut first 4 characters which including B: and space

        echo $string | grep "I:"  1>/dev/null 
    if [ `echo $?` -eq 0 ] //if start with " I: "
    then
        $newstring= echo ${var:4} //cut first 4 characters which including I: and space
done < file.txt

我不知道的是如何将它放回到行（在文件中）以及如何将该行连接到之前处理过的行。

Answer 1

awk -F":" '{a[NR]=$0}
           /^ B:/{print line;line=$2}
           /^ I:/{line=line" "$2}
           END{
               if(a[NR]!~/^B/)
               {print line}
          }' Your_file

Answer 2

使用awk打印I:和B:条记录的第二个字段。变量first用于控制换行输出。

/B:/搜索B:模式。这种模式标志着记录的开始。如果记录不是第一个，则打印换行符，然后打印数据$ 2。

如果找到的模式为I:，则会打印数据$ 2（I:后面的第二个字段。

awk 'BEGIN{first=1}
     /B:/ { if (first) first=0; else  print "";  printf("%s ", $2); }
     /I:/ { printf("%s ", $2) }
     END {print ""}' filename

Answer 3

这可能适合你（GNU sed）：

sed -r ':a;$!N;s/\n$//;s/\n\s*I://;ta;s/B://g;s/^\s*//;P;D' file

或：

sed -e ':a' -e '$!N' -e 's/\n$//' -e 's/\n\s*I://' -e 'ta' -e 's/B://g' -e 's/^\s*//' -e 'P' -e 'D' file

Answer 4

awk '/^B/ {printf "\n%s",$2} /^I/ {printf " %s",$2}' file

that White House
the emergency rooms
trauma centers

缩短一些

awk '/./ {printf /^B/?"\n%s":" %s",$2}' file

Answer 5

在RS模式上使用awk自动拆分有一个有趣的解决方案。请注意，这对输入格式的变化有点敏感：

<infile awk 1 RS='(^|\n)B: ' | awk 1 RS='\n+I: ' ORS=' ' | grep -v '^ *$'

输出：

that White House
the emergency rooms
trauma centers

这至少适用于GNU awk和Mikes awk。

如何连接文本文件中的单词

5 个答案: