Question

我需要通过bash脚本检查一个文件是否在另一个文件中。对于给定的多行模式和输入文件。

返回值

我想接收状态（如果在grep命令中如何）0如果找到任何匹配，如果没有找到匹配则为1。

模式：

多，
行的顺序很重要（被视为单个行块），
包括数字，字母，？，＆amp;，*，＃等字符，

解释

只有以下示例才能找到匹配项：

pattern     file1 file2 file3 file4
222         111   111   222   222
333         222   222   333   333
            333   333         444
            444

以下不应该：

pattern     file1 file2 file3 file4 file5 file6 file7
222         111   111   333   *222  111   111   222
333         *222  222   222   *333  222   222   
            333   333*        444   111         333
            444                     333   333

这是我的剧本：

#!/bin/bash

function writeToFile {
    if [ -w "$1" ] ; then
        echo "$2" >> "$1"
    else
        echo -e "$2" | sudo tee -a "$1" > /dev/null
    fi
}

function writeOnceToFile {
        pcregrep --color -M "$2" "$1"
        #echo $?

        if [ $? -eq 0 ]; then
            echo This file contains text that was added previously
        else
            writeToFile "$1" "$2"
        fi
}

file=file.txt 
#1?1
#2?2
#3?3
#4?4

pattern=`cat pattern.txt`
#2?2
#3?3

writeOnceToFile "$file" "$pattern"

我可以对所有模式行使用grep命令，但是这个例子失败了：

file.txt 
#1?1
#2?2
#=== added line
#3?3
#4?4

pattern.txt
#2?2
#3?3

或者即使你改变了行：2和3

file=file.txt 
#1?1
#3?3
#2?2
#4?4

当它不应该返回0时。

我该如何解决？请注意，我更喜欢使用本机安装的程序（如果这可以没有pcregrep）。也许sed或awk可以解决这个问题？

Answer 1

我只会使用diff执行此任务：

diff pattern <(grep -f file pattern)

解释

diff file1 file2报告两个文件是否不同。
通过说grep -f file pattern，您会看到pattern中file的内容是什么。

所以你正在做的是检查pattern中file的哪些行，然后将其与pattern本身进行比较。如果匹配，则表示pattern是file的一部分！

测试

seq 10是seq 20的一部分！我们来检查一下：

$ diff <(seq 10) <(grep -f <(seq 20) <(seq 10))
$

seq 10不在seq 2 20内（1不在第二个）：

$ diff -q <(seq 10) <(grep -f <(seq 2 20) <(seq 10))
Files /dev/fd/63 and /dev/fd/62 differ

Answer 2

我有一个使用perl的工作版本。

我以为我使用GNU awk，但我没有。 RS =空行上的空字符串拆分。请参阅损坏的awk版本的编辑历史记录。

How can I search for a multiline pattern in a file?显示了如何使用pcregrep，但是当搜索模式可能包含正则表达式特殊字符时，我看不到让它工作的方法。 -F固定字符串模式不适用于多行模式：它仍然将模式视为一组要单独匹配的行。（不是要匹配的多行固定字符串。）我看到你在尝试时已经使用了pcregrep。

顺便说一句，我认为你的代码中有一个非sudo案例中的错误：

function writeToFile {
    if [ -w "$1" ] ; then
        "$2" >> "$1"   # probably you mean  echo "$2" >> "$1"
    else
        echo -e "$2" | sudo tee -a "$1" > /dev/null
    fi
}

无论如何，使用基于行的工具的尝试都遇到了失败，所以现在是时候推出一种更严格的编程语言，这种语言不会强制推出换行惯例。只需将两个文件都读入变量，然后使用非正则表达式搜索：

#!/usr/bin/perl -w
# multi_line_match.pl  pattern_file  target_file
# exit(0) if a match is found, else exit(1)

#use IO::File;
use File::Slurp;
my $pat = read_file($ARGV[0]);
my $target = read_file($ARGV[1]);

if ((substr($target, 0, length($pat)) eq $pat) or index($target, "\n".$pat) >= 0) {
    exit(0);
}
exit(1);

请参阅What is the best way to slurp a file into a string in Perl?以避免依赖File::Slurp（这不是标准perl发行版或默认Ubuntu 15.04系统的一部分）。对于非perl-geeks，我去File :: Slurp部分是为了让程序正在做的事情的可读性，相比之下：

my $contents = do { local(@ARGV, $/) = $file; <> };

我正努力避免将整个文件读入内存，并提出http://www.perlmonks.org/?node_id=98208的想法。我认为不匹配的情况通常仍然会立即读取整个文件。此外，逻辑对于处理文件前面的匹配非常复杂，我不想花很长时间进行测试以确保它对所有情况都是正确的。这就是我放弃之前所拥有的：

#IO::File->input_record_separator($pat);
$/ = $pat;  # pat must include a trailing newline if you want it to match one

my $fh = IO::File->new($ARGV[2], O_RDONLY)
    or die 'Could not open file ', $ARGV[2], ": $!";

$tail = substr($fh->getline, -1);  #fast forward to the first match
#print each occurence in the file
#print IO::File->input_record_separator  while $fh->getline;

#FIXME: something clever here to handle the case where $pat matches at the beginning of the file.
do {
    # fixme: need to check defined($fh->getline)
    if (($tail eq '\n') or ($tail = substr($fh->getline, -1))) {
    exit(0);  # if there's a 2nd line
    }
} while($tail);

exit(1);
$fh->close;

另一个想法是过滤通过tr '\n' '\r'或其他东西搜索的模式和文件，因此它们都是单行的。（\r是一个可能安全的选择，不会与文件或模式中已有的任何内容发生冲突。）

Answer 3

我再次解决了这个问题，我认为awk可以更好地处理这个问题：

awk 'FNR==NR {a[FNR]=$0; next}
     FNR==1 && NR>1 {for (i in a) len++}
     {for (i=last; i<=len; i++) {
         if (a[i]==$0) 
            {last=i; next}
     } status=1}
     END {print status+0}' file pattern

这个想法是： - 读取数组file内存中的所有文件a[line_number] = line。 - 计算数组中的元素。 - 循环遍历文件pattern并检查当前行是否在file出现在光标位置和文件末尾file之间的任何时间。如果匹配，请将光标移动到找到它的位置。如果没有，请将状态设置为1 - 也就是说，pattern中有一行在上一次匹配后file内没有出现。 - 打印状态，除非0之前设置为1，否则将为$ tail f p ==> f <== 222 333 555 ==> p <== 222 333 $ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' f p 0。

测试

他们匹配：

$ tail f p
==> f <==
333
222
555

==> p <==
222
333
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' f p
1

他们没有：

seq

使用$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' <(seq 2 20) <(seq 10) 1 $ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' <(seq 20) <(seq 10) 0：

Site Admin > Development > Experimental > Experimental Settings > Enable SEB

如何检查一个文件是否属于其他文件？

3 个答案:

解释

测试

测试