Question

我有一个巨大的文件，并且作为输出，一些列没有值，我需要用0填充这些列以进行进一步分析。我可以用空格或制表符分隔列，现在它下面用制表符分隔。

alt text

Answer 1

这实际上是CSV解析器的工作，但如果它必须是正则表达式，并且您从未在引用的CSV条目中有选项卡，则可以搜索

(^|\t)(?=\t|$)

并替换为

$10

所以，在Perl：

(ResultString = $subject) =~ 
s/(    # Match either...
   ^   # the start of the line (preferably)
   |   # or
   \t  # a tab character
  )    # remember the match in backreference no. 1
  (?=  # Then assert that the next character is either
   \t  # a(nother) tab character
   |   # or
   $   # the end of the line
  )    # End of lookahead assertion
/${1}0/xg;

这将改变

1   2       4           7   8
    2   3       5   6   7

到

1   2   0   4   0   0   7   8   
0   2   3   0   5   6   7   0

Answer 2

对于制表符分隔的文件，此AWK代码段可以解决这个问题：

BEGIN { FS = "\t"; OFS="\t" }
{
    for(i = 1; i <= NF; i++) {
         if(!$i) { $i = 0 }
    }
    print $0
}

Answer 3

重新阅读原帖后删除我的答案。没有标签作为数据，只是分隔符。如果没有数据，则会使用双分隔符来对齐列它不可能是任何其他方式。因此，如果有一个分隔符，它将分隔两个空字段。 “”= 1个空字段，“\ t”= 2个空字段。我现在知道了。

Tim Pietzcker一直都有正确的答案。为他+1 它可以替换为s/ (?:^|(?<=\t)) (?=\t|$) /0/xg;，但它是一样的。

Answer 4

这是一个sed解决方案。请注意，sed的某些版本不喜欢\t。

sed 's/^\t/0\t/;:a;s/\t\t/\t0\t/g;ta;s/\t$/\t0/' inputfile

或

sed -e 's/^\t/0\t/' -e ':a' -e 's/\t\t/\t0\t/g' -e 'ta' -e 's/\t$/\t0/' inputfile

说明：

s/^\t/0\t/    # insert a zero before a tab that begins a line
:a            # top of the loop
    s/\t\t/\t0\t/g    # insert a zero between a pair of tabs
ta            # if a substitution was made, branch to the top of the loop
s/\t$/\t0/    # insert a zero after a tab that ends a line

Answer 5

并且仅当您的数据仅包含数字并且您具有明确定义的字段分隔符FS时，您才可以使用以下技巧：

awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=NF;++i) $i+=0}1' file

通过加零，我们将字符串转换为数字。空字符串将被转换为数字零。您可以根据自己的喜好定义字段分隔符。

但是，这可能会有点慢，因为每次您重新分配字段$0时，它将重新解析$i并将其拆分为多个字段。

一种更快的方法是解决Dennis Williamson

填充空格/制表符分隔，空列为0

5 个答案: