Question

一个例子，test.txt：

This is bad, real bad!
<?xml version="1.0" encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"

我有一个正则表达式：^<\?xml.*\?>。

grep逐行匹配。所以这个正则表达式可以匹配（第二行）。

但我想让grep将这些行视为一个大行，并且无法匹配，因为它不是以<?xml开头的。

我试过了：

grep -P -z -- '^<\?xml.*\?>' test.txt

使用-z但它仍然匹配第二行。

有没有办法使用grep使其不匹配，还是有另一个正则表达式命令行工具？

Answer 1

如果你使用\A代替锚^，那么它将无法匹配：

# finds no match
grep -Pz -- '\A<\?xml.*\?>' file

多行字符串grep中的^在每行的开头匹配，但\A在输入的实际开始时匹配。

Answer 2

包含换行符的

#!/usr/bin/perl use warnings; use strict; my $dir = ("Users/rob/Documents/Lamda"); open (INFILE, "<", "Users/rob/dnaclust/testscript.txt") or die $!; open (OUTFILE, ">", "Users/rob/Codes/testscript.txt") or die $!; my %hash = (); my @ArrayOfFiles = glob "$dir/*"; print join("\n", @ArrayOfFiles), "\n"; foreach my $file (@ArrayofFiles){ open (my $sequence, $file) or die "can't open file: $!"; while (my $line = <$sequence>) { if ($line !~/^>/){ my $seq = $line; $seq =~ s/\R//g; $seq =~ m/(CATCAT|TACTAC)([TAGC]{18})([TAGC]+)([TAGC]{18})(CATCAT|TACTAC)/; $hash{$2} = $3; } } } while (<INFILE>) { chomp; my @fields = split /\n/;模式（bash：`grep`）

试试这个：

$'\n'

，这个

grep -Pz '\AThis.*\n<\?xml.*\?>' test.txt

，这个

grep -Pz '<\?xml.*\?>' test.txt

和这个

grep -Pz '^<\?xml.*\?>' test.txt

或者

grep -Pz '\A<\?xml.*\?>' test.txt

Answer 3

在应用正则表达式之前，您可以通过xargs将行加入大行：

# no match returns
cat test.txt | xargs | grep '^<?xml.*?>'

Check for more usage on xargs

Answer 4

您的问题不明确，但如果您希望test.txt将所有行视为一行，则可以将grep中的所有换行符替换为空格，然后再将其传递到grep "pattern" <( tr '\n' ' ' < test.txt) ，像这样：

{{1}}

您的原始文件将不受影响，将换行符转换为空格“on-the-fly”。

Answer 5

目前还不清楚你想要什么，但也许不是：

awk -v RS='<[?]xml.*[?]>' '$0=RT' file

例如：

$ cat file
This is bad, real bad!
<?xml
        version="1.0"
                encoding="UTF-8" ?>
<wsdl:definitions targetNamespace="http://tips.cf"
xmlns:impl="http://tips.cf" xmlns:intf="http://tips.cf"
xmlns:apachesoap="http://xml.apache.org/xml-soap"

$ awk -v RS='<[?]xml.*[?]>' '$0=RT' file         
<?xml
        version="1.0"
                encoding="UTF-8" ?>

以上使用GNU awk进行多字符RS和RT。与其他问题一样，它是：

$ awk '{rec = rec $0 RS} END{ if (match(rec,/<[?]xml.*[?]>/)) print substr(rec,RSTART,RLENGTH)}' file   
<?xml
        version="1.0"
                encoding="UTF-8" ?>

Linux grep如何将所有行视为正则表达式的一大行？

5 个答案: