Question

我在一个列中有一个数据集，当我找到某个字符串时，我希望将其拆分为任意数量的新列（在这种情况下＆＃39; male_position＆＃39;。

>cat test.file

male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
30.33
40.37
40.37
male_position
0.00
1.05 
2.2
4.0
4.0
8.2
25.2
30.1
male_position
1.0
5.0

我希望该脚本每次都能生成新的制表符分隔列＆＃39; male_position＆＃39;遇到但只是打印下面的每一行/数据点（添加到该列），直到下一次出现＆＃39; male_position＆＃39;：

script.awk test.file > output

0.00  0.00  1.0
0.00  1.05  5.0
1.05  2.2
1.05  4.0
1.05  4.0
1.05  8.2
3.1  25.2
5.11 30.1
12.74
30.33
40.37
40.37

有什么想法吗？

更新 - 我试图根据这篇文章（Linux split a column into two different columns in a same CSV file）

调整代码

cat script.awk

BEGIN {
   line = 0; #Initialize at zero
}
/male_position/ { #every time we hit the delimiter
   line = 0; #resed line to zero
}
!/male_position/{ #otherwise
   a[line] = a[line]" "$0; # Add the new input line to the output line
   line++; # increase the counter by one
}
END {
   for (i in a )
      print a[i] # print the output
}

...结果

$ awk -f script.awk test.file
 1.05 2.2
 1.05 4.0
 1.05 4.0
 1.05 8.2
 3.1 25.2
 5.11 30.1
 12.74
 30.33
 40.37
 40.37
 0.00 0.00 1.0
 0.00 1.05  5.0

更新2 #######

我可以使用test.file案例重新创建预期。使用测试文件和＆＃39; awk.script＆＃34;（见上文）在Linux上运行脚本（script.awk）似乎有效。但是，该简单示例文件在分隔符（male_position）之间仅具有减少的列数（数据点）。当你增加列数时，输出似乎失败了......

cat test.file2

male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
male_position
0
5
10
male_position
0
1
2
3
5

awk -f script.awk test.file2

0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05 
3.1
5.11
12.74

没有＆＃39;填充＆＃39;对于给定列的最后一次观察之后的行，因此具有比前一列更多值的列的值与前一列一致（3和5位于列2中，当它们应该在列中时3）。

Answer 1

这是csplit+paste解决方案

$ csplit --suppress-matched -zs test.file2 /male_position/ {*}
$ ls
test.file2  xx00  xx01  xx02
$ paste xx*
0.00    0   0
0.00    5   1
1.05    10  2
1.05        3
1.05        5
1.05        
3.1     
5.11        
12.74

来自man csplit

csplit - 将文件拆分为由上下文行确定的部分

-z， - elide-empty-files   删除空输出文件

-s， - quiet， - silent   不打印输出文件大小的计数

- 抑制匹配   抑制匹配PATTERN的行

/male_position/是用于拆分输入文件的正则表达式
{*}指定创建尽可能多的拆分
使用-f和-n选项更改默认输出文件名
paste xx*以列方式粘贴文件，TAB是默认分隔符

Answer 2

关注awk可能对您有帮助。

awk '/male_position/{count++;max=val>max?val:max;val=1;next} {array[val++,count]=$0} END{for(i=1;i<=max;i++){for(j=1;j<=count;j++){printf("%s%s",array[i,j],j==count?ORS:OFS)}}}' OFS="\t"   Input_file

现在也添加非单线形式的解决方案。

awk '
/male_position/{
  count++;
  max=val>max?val:max;
  val=1;
  next}
{
  array[val++,count]=$0
}
END{
  for(i=1;i<=max;i++){
      for(j=1;j<=count;j++){   printf("%s%s",array[i,j],j==count?ORS:OFS)   }}
}
' OFS="\t"   Input_file

使用bash awk

2 个答案: