Question

我有一个包含文本的数据文件。我想知道Bash中读取该文件并将输出通过管道传输到新创建的管道分离文件的最佳方法是什么？分隔符对我来说很棘手...

有问题的文件可以具有一个或多个文本数据字段，例如：

First Name: Bill Last Name: Gates
Color: Blue
Start: 12/11/19 End:12/12/20

因此，管道分隔文件应如下所示：

Bill|Gates|Blue|12/11/19|12/12/20

我的脚本解析机制遇到问题。我以前一直在使用这个sed示例，它将替换,并删除CSV文件中的""。我正在修改。

sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//' $file

假设每个需要分隔的变量前面都带有一个“：”，并且我们知道下一个需要分隔的变量之前的单词，那么最好的方法是什么？ sed是这样吗？我担心数据中的最后一个单词可能是标签名称的情况。

ie. First Name: Last Last Name:

它应该始终具有相同的输入，尽管可能稍微复杂一些。但是，应该始终在数据字段上贴上标准标签。

编辑：我了解。我没有任何特别的基础数据可以建议。它更开放。我想我只是想简单地基于带有始终相同标签的文本文件转换为PSV文件。

要分离的数据应始终位于：

我没有完整的单词来标记数据，因为这很冗长。让我们假设上面的简单示例。

通过选择的答案进行编辑：

#!/bin/bash

awk  '
BEGIN{
  FS="[: ]"
  OFS="|"
}
match($0,/First.*Last Name: /){
  first_name=substr($0,RSTART,RLENGTH)
  gsub(/First Name: |Last.*/,"",first_name)
  last_name=substr($0,RSTART+RLENGTH)
  next
}
match($0,/^Color:/){
  color=$NF
  next
}
match($0,/Start.*End:/){
  start=substr($0,RSTART,RLENGTH)
  gsub(/Start: | End:/,"",start)
  end=substr($0,RSTART+RLENGTH)
  print first_name,last_name,color,start,end
}
'  data.txt > data_pipe_separated.txt

输出：

Bill |Gates|Blue|12/11/19|12/12/20

Answer 1

您可以尝试按照提供的示例进行测试和测试吗？

awk  '
BEGIN{
  FS="[: ]"
  OFS="|"
}
match($0,/First.*Last Name: /){
  first_name=substr($0,RSTART,RLENGTH)
  gsub(/First Name: | Last.*/,"",first_name)
  last_name=substr($0,RSTART+RLENGTH)
  next
}
match($0,/^Color:/){
  color=$NF
  next
}
match($0,/Start.*End:/){
  start=substr($0,RSTART,RLENGTH)
  gsub(/Start: | End:/,"",start)
  end=substr($0,RSTART+RLENGTH)
  print first_name,last_name,color,start,end
}
'  Input_file

说明： ：添加了上述代码的详细说明。

awk  '                                             ##Starting awk program from here.
BEGIN{                                             ##Starting BEGIN section of this program from here.
  FS="[: ]"                                        ##Setting FS as space OR : here.
  OFS="|"                                          ##Setting OFS as |(pipe) here.
}                                                  ##Closing BEGIN section of this program here.
match($0,/First.*Last Name: /){                    ##Using match condition to match a regex First to till string Last Name: here in a line.
  first_name=substr($0,RSTART,RLENGTH)             ##Creating variable first_name whose value is sub string of RSTART to RLENGTH values.
  gsub(/First Name: |Last.*/,"",first_name)        ##Globally substituting either First Name: OR Last.* with NULL in variable first_name.
  last_name=substr($0,RSTART+RLENGTH)              ##Creating variable last_name whose value is sub string of RSTART+RLENGTH value to till end of line value.
  next                                             ##next will skip all further statements from here.
}                                                  ##Closing BLOCK for above condition here.
match($0,/^Color:/){                               ##Using match to match regex from string Color: then do following.
  color=$NF                                        ##Creating variable named color whose value is last field of current line.
  next                                             ##next will skip all further statements from here.
}                                                  ##Closing BLOCK for above condition here.
match($0,/Start.*End:/){                           ##Using match to match regex Start.*End: here in current line, if match found then do following.
  start=substr($0,RSTART,RLENGTH)                  ##Creating variable start whose value is sub string of RSTART to RLENGTH here.
  gsub(/Start: | End:/,"",start)                   ##Globally substituting Start: OR End: with NULL in variable start here.
  end=substr($0,RSTART+RLENGTH)                    ##Creating variable end whose value is sub string of RSTART+RLENGTH till end of the line here.
  print first_name,last_name,color,start,end       ##Printing variables named first_name,last_name,color,start,end here.
}                                                  ##Closing BLOCK for above condition here.
' Input_file                                       ##Mentioning Input_file name here.

Answer 2

使用sed：

$ sed -nz 's/[^:]*: *\(\S*\)/\1|/gp;s/\n//'  input_file
Bill|Gates|Blue|12/11/19|12/12/20|

模式/[^:]*: *\(\S*\)/搜索零个或多个非冒号[^:]*，后跟一个冒号和零个或多个空格: *，后跟零个或多个非空格字符\(\S*\)。替换为\1|，它是捕获的组的内容，后跟管道字符。然后s/\n//删除新行。

在Bash Shell中读取文本文件管道到管道分离的文件

2 个答案: