Question

我有一个包含以下文字的文件：

subject:asdfghj
subject:qwertym
subject:bigger1
subject:sage911
subject:mothers
object:cfvvmkme
object:rjo4j2f2
object:e4r234dd
object:uft5ed8f
object:rf33dfd1

我希望使用awk或sed获得以下结果（作为oneliner将是一个奖励！[Perl oneliner也可以接受]）：

subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

我希望每个匹配'subject'和'object'的行按照每个列出的顺序组合，用逗号分隔。我可以看到用awk，sed或perl完成此操作的示例吗？（如果可能的话，最好是作为一个oneliner？）

我尝试过使用awk来执行此操作，我还在学习我应该添加：

awk '{if ($0 ~ /subject/) pat1=$1; if ($0 ~ /object/) pat2=$2} {print $0,pat2}'

但不做我想的那样！所以我知道我的语法错了。如果我要看到一个非常有用的例子，那么我就可以学习。

Answer 1

不是perl或awk而是更容易。

$ pr -2ts, file
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

<强>解释

-2 2列

t忽略打印标题（文件名，日期，页码等）

s,使用逗号作为列分隔符

Answer 2

我在perl中做了类似的事情：

#!/usr/bin/perl

use strict;
use warnings;

my @subjects;
while ( <DATA> ) { 
    m/^subject:(\w+)/ and push @subjects, $1; 
    m/^object:(\w+)/ and print "subject:",shift @subjects,",object:", $1,"\n";
}


__DATA__
subject:asdfghj
subject:qwertym
subject:bigger1
subject:sage911
subject:mothers
object:cfvvmkme
object:rjo4j2f2
object:e4r234dd
object:uft5ed8f
object:rf33dfd1

减少到一个班轮，这将是：

perl -ne '/^(subject:\w+)/ and push @s, $1; /^object/ and print shift @s,$_' file

Answer 3

grep，paste和流程替换

$ paste -d , <(grep 'subject' infile) <(grep 'object' infile)
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

由于流程替换（grep 'subject' infile），这会处理grep 'object' infile和<( )类文件的输出，然后将结果与paste一起粘贴，使用逗号作为分隔符（由-d ,表示）。

<强> SED

我们的想法是在保留空间中读取并存储所有主题行，然后为每个对象行获取保留空间，获取正确的主题并将剩余的主题行放回保留空间。

首先是不可读的oneliner：

$ sed -rn '/^subject/H;/^object/{G;s/\n+/,/;s/^(.*),([^\n]*)(\n|$)/\2,\1\n/;P;s/^[^\n]*\n//;h}' infile
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

-r适用于扩展正则表达式（无法转义括号+和|），默认情况下不会打印-n。

扩展，更具可读性和解释：

/^subject/H         # Append subject lines to hold space
/^object/ {         # For each object line
    G               # Append hold space to pattern space
    s/\n+/,/        # Replace first group of newlines with a comma

    # Swap object (before comma) and subject (after comma)
    s/^(.*),([^\n]*)(\n|$)/\2,\1\n/

    P               # Print up to first newline
    s/^[^\n]*\n//   # Remove first line (can't use D because there is another command)
    h               # Copy pattern space to hold space
}

说明：

第一次获取保留空间时，它以换行符开始（H添加一行），因此换行符替换换行符替换一行或多行，因此\n+ ：第一次有两个新行，其余一个为新行。
要在交换中锚定主题部分的结尾，我们使用(\n|$)：换行符或模式空间的结尾 - 这也是为了在最后一行进行交换，我们不在39; t在模式空间的末尾有一个换行符。
这适用于GNU sed。对于MacOS中的BSD sed，需要进行一些更改：
- -r选项必须由-E替换。
- 在结束括号之前必须有一个额外的分号：h;}
- 要在替换字符串（swap命令）中插入换行符，我们必须用\n或'$'\n''替换'"$(printf '\n')"'。

Answer 4

因为你特意要求一个＆＃34; oneliner＆＃34;我认为简洁对你来说比清晰度更重要：

$ awk -F: -v OFS=, 'NR>1&&$1!=p{f=1}{p=$1}f{print a[++c],$0;next}{a[NR]=$0}' file
subject:asdfghj,object:cfvvmkme
subject:qwertym,object:rjo4j2f2
subject:bigger1,object:e4r234dd
subject:sage911,object:uft5ed8f
subject:mothers,object:rf33dfd1

使用awk或sed合并/打印匹配模式的行（oneliner？）

4 个答案: