Question

我有一个制表符分隔的文件，其中至少有16列（但可能更多），其中第一列是唯一标识符;和＆gt; 10,000行（示例中仅显示6x6），如下所示：

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

如果其中一个值已经是“-9”，我需要将VAR1-5的所有值更改为“-9”

因此，所需的输出将是：

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

到目前为止，我已经尝试过这样做：

awk -F'\t' '
BEGIN{OFS="\t"}
{for(i=2;i<=NF;i++){if ($i=="-9"){for(j=2;j<=NF;j++){$j="-9"};continue}}};1
' < file1.tab

哪个有效，但在应用于实际数据集时速度很慢。有更快的方法吗？也许是grep和sed组合的东西？

Answer 1

这是一个不会对列数进行硬编码的变体。

awk -F '\t' '/(^|\t)-9(\t|$)/ {
    printf $1; for(i=2; i<=NF; ++i) printf "\t-9"; printf "\n"
    next }
  1' file1 file2

这里的主要优化是Awk立即扫描整行并立即触发正则表达式，而不需要遍历所有字段，除非它已经知道存在匹配。

因为我们知道除了第一个字段之外我们将丢弃所有字段，所以不需要让Awk替换字段以便它们可以打印它们。只需生成我们想要打印的输出并继续前进，而无需触及Awk的内部线条表示。这也应该购买几个周期，尽管这是一个非常小的性能改进。

Answer 2

关注awk可能对您有帮助，我已使用您提供的示例对其进行了测试。

awk 'FNR==1{print;next} /(^|\t)-9(\t|$)/{print $1,"-9   -9    -9    -9    -9";next} 1' OFS="    "   Input_file

如果OP在Input_file中有超过5个字段，那么以下可能会有所帮助，逻辑与三元先生的解决方案相同，我在遍历字段但是尽管打印-9我正在分配字段的值到-9。

awk 'FNR==1{print;next} /(^|\t)-9(\t|$)/{for(i=2;i<=NF;i++){$i=-9};} 1' OFS="\t\t"   Input_file

输出如下。

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

说明： 现在也向上面的代码添加说明。

awk '
FNR==1{                ##Checking condition here if line number is 1 then do following:
  print;               ##Printing the current line then which will be very first line of Input_file.
  next                 ##next is awk out of the box keyword which will skip all further statements for program.
}
/(^|\t)-9(\t|$)/{        ##Checking here if -9 is coming in a line either with spaces or without spaces, if yes then do following:
  print $1,"-9   -9    -9    -9    -9";  ##printing the first field of current line along with 5 -9 values as per OPs request to do so.
  next                 ##next will skip all further statements.
}
1                      ##awk works on method of condition then action, so I am making condition TRUE here by mentioning 1 here and not mentioning action here so by default print of the current line will happen.
' OFS="    " Input_file   ##Setting OFS(output field separator) value to spaces and mentioning the Input_file name here.

Answer 3

sed -r '/-9/s/[^ ]+/-9/2g' input.txt

<强>输出

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9    -9     -9     -9    -9
5    5    5     5     5     5
6    -9    -9    -9     -9     -9

Answer 4

使用 GNU awk

的更多方法

<强>一衬垫：

awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++)$0=gensub (/[^[:blank:]]+/,-9,i)}1' infile

更好的可读性：

awk '/(^|[ \t]+)-9([ \t]+|$)/{
       for(i=2; i<=NF; i++)
            $0=gensub (/[^[:blank:]]+/,-9,i)
     }1
    ' infile

测试结果：

<强> 输入：

$ cat infile
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

<强> 输出：

（因为-间距已移位）

$ awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++)$0 = gensub (/[^[:blank:]]+/, -9 , i)}1' infile  
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9    -9     -9     -9    -9
5    5    5     5     5     5
6    -9    -9    -9     -9     -9

如果您希望输出看起来更好，请尝试以下方法:(不推荐）

awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++){ if($i==-9)continue; $0 = gensub (/[^[:blank:]]+/, "\b-9" , i)}}1' infile  
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4   -9   -9    -9     -9   -9
5    5    5     5     5     5
6   -9    -9   -9    -9    -9

上述更具可读性的版本：

awk '/(^|[ \t]+)-9([ \t]+|$)/{
          for(i=2; i<=NF; i++)
          { 
            if($i==-9)continue; 
            $0 = gensub(/[^[:blank:]]+/, "\b-9" , i)
          }
     }1
    ' infile

Answer 5

awk 'BEGIN{IFS=OFS="    "}/-9/{for(i=2;i<=NF;i++){$i=-9}}1' filename

awk / sed：如果任何字段与模式匹配，则替换所有字段

5 个答案: