Question

是否可以使用sed / awk匹配行中模式的最后 k 次出现？

为简单起见，例如我只想匹配每行中的最后3个逗号（请注意，这两行的逗号总数不同）：

10, 5, "Sally went to the store, and then , 299, ABD, F, 10
10, 6, If this is the case, and also this happened, then, 299, A, F, 9

我想只匹配从299开始直到两个基线的结尾的逗号。

动机：我正在尝试将其中一个字段中带有迷路逗号的CSV文件转换为制表符分隔符。由于正确列的数量是固定的，我的想法是用标签替换前几个逗号，直到麻烦的字段（这是直截了当的），然后从行的末尾向后移动以再次替换。这应该将所有正确的分隔符逗号转换为制表符，同时在有问题的字段中保留逗号。

这可能是一种更聪明的方法，但我认为无论如何这都是一个很好的sed / awk教学点。

Answer 1

另一个sed替代方案。用标签替换最后3个逗号

$ rev file | sed 's/,/\t/;s/,/\t/;s/,/\t/' | rev

10, 5, "Sally went to the store, and then , 299  ABD     F       10

使用GNU sed，您只需编写

即可

$ sed 's/,/\t/g5' file

10, 5, "Sally went to the store, and then , 299  ABD     F       10

从第5开始全部替换。

Answer 2

一个与最后三个逗号分别匹配的正则表达式将需要一个负前瞻，而sed不支持。您可以使用以下sed-regex将最后三个字段和逗号直接匹配在它们之前：

,[^,]*,[^,]*,[^,]*$

$匹配该行的结尾。

[^,]匹配,以外的任何内容。

组允许您重复使用sed中的字段值：

sed -r 's/,([^,]*),([^,]*),([^,]*)$/\t\1\t\2\t\3/'

对于awk，请查看How to print last two columns using awk。

这可能是一种更聪明的方法

如果你想要的所有逗号后跟一个空格而不想要的逗号，那么

sed 's/,[^ ]/./g'

这会将a, b, 12,3, c转换为a, b, 12.3, c。

Answer 3

您可以使用Perl在每行中添加缺少的双引号：

perl -aF, -ne '$F[-5] .= q("); print join ",", @F' < input > output

或者，将逗号变为标签：

 perl -aF'/,\s/' -ne 'splice @F, 2, -4, join ", ", @F[ 2 .. $#F - 4 ]; print join "\t", @F' < input > output

-n逐行读取输入。
-a将输入拆分为-F指定的模式下的@F数组。
第一个解决方案将缺少的引号添加到右边的第五个字段中;第二个将右边第三个到第五个的项替换为“，”连接的元素，并将结果数组与标签分开。

Answer 4

要修复CSV，我会这样做：

echo '10, 5, "Sally went to the store, and then , 299, ABD, F, 10' |
  perl -lne '
    @F = split /, /;             # field separator is comma and space
    @start = splice @F, 0, 2;    # first 2 fields
    @end = splice @F, -4, 4;     # last 4 fields
    $string = join ", ", @F;     # the stuff in the middle
    $string =~ s/"/""/g;         # any double quotes get doubled
    print join(",", @start, "\"$string\"", @end);
  '

输出

10,5,"""Sally went to the store, and then ",299,ABD,F,10

Answer 5

嗨，我想这是在做这个工作

echo 'a,b,c,d,e,f' | awk -F',' '{i=3; for (--i;i>=0;i--) {printf "%s\t", $(NF-i) } print ""}'

返回

d    e    f

但你需要确保你有超过3个参数

Answer 6

这将为您提供GNU awk要求的第3个arg匹配（）：

$ cat tst.awk
{
    gsub(/\t/," ")
    match($0,/^(([^,]+,){2})(.*)((,[^,]+){3})$/,a)
    gsub(/,/,"\t",a[1])
    gsub(/,/,"\t",a[4])
    print a[1] a[3] a[4]
}

$ awk -f tst.awk file
10       5       "Sally went to the store, and then , 299        ABD     F       10
10       6       If this is the case, and also this happened, then, 299  A       F       9

但我不相信你所要求的是一个好方法，所以YMMV。

无论如何，请注意第一个gsub（），确保输入行没有选项卡 - 如果要将某些逗号转换为制表符以将制表符用作输出字段分隔符，这一点至关重要！

匹配行中模式的最后K次出现

6 个答案: