Question

我试图编写一个从大文件中读取文本并将特定文本块写入另一个文件的函数。

示例文件

@Tag
Scenario 1:
   Do thing 1
   Do thing 2
Scenario 2:
   Do thing 1
   Do thing 3
@Tag2
Scenario 3:
   Do thing 1
   Don't do thing 4

我试图逐行读取此文件（现在使用IFS）并希望输出如下：

档案1

@Tag
Scenario 1:
   Do thing 1
   Do thing 2

文件2

Scenario 2:
   Do thing 1
   Do thing 3

档案3

@Tag2
Scenario 3:
   Do thing 1
   Don't do thing 4

我有一些部分可以阅读文件，并在＆＃34; Scenario＆＃34;模式和它之后的行，但我遇到的问题是试图找出如何捕获@Tag模式并写入它如果它存在于＆＃34;场景＆＃34;图案。

编辑：这是脚本的当前相关部分：

function writeToTestFile {
while IFS='' read -r line || [[ -n "$line" ]]; do
    #if line matches the tag pattern of "@" followed by anything, store it
    if [[ $line == *@* || "" ]]; then
        local tagValue=$line

    #if line in file matches "Scenario:" pattern, write to new file
    elif [[ $line == *Scenario:* ]]; then
        fileToWriteTo=$filename$counter$extention
        ((counter++))
        echo "writing to $fileToWriteTo"
        touch $dirToWriteTo/$fileToWriteTo

    else
        #if line does not match "Scenario:" pattern, check for existing file and write to that
        if [[ -e $dirToWriteTo/$fileToWriteTo ]]; then
            echo "   "$line >> $dirToWriteTo/$fileToWriteTo
        fi
    # if file does not exist and line does match pattern, do nothing
    fi

done < "$1"

}

Answer 1

您可以使用bash中的函数轻松地解析文件。关键是不要担心寻找标记线。只需查看Scenario，在每个迭代中检查/保存前一个标记行，如tag之类的变量。找到Scenario后，请检查tag是否存在。如果是这样，请在tag之前写下Scenario中保留的标语，然后继续正常写入输出。

#!/bin/bash

function writeToTestFile {
    [ -z "$1" ] && {    ## validate input
        printf "%s() error: insufficient input.\n" "$FUNCNAME"
        return 1
    }
    [ -r "$1" ] || {    ## validate file readable
        printf "%s() error: file not readable '%s'\n" "$FUNCNAME" "$1"
        return 1
    }
    local tag=""    ## use local declarations
    local line=""
    local num=""
    local fname=""
    while IFS='' read -r line || [ -n "$line" ]; do
        if [ "${line// */}" = "Scenario" ]; then    ## check Scenario
            num="${line/Scenario /}"                ## parse num
            fname="File_${num%:}.txt"               ## parse fname
            :> "$fname"                             ## truncate fname
            [ -n "$tag" ] && printf "%s\n" "$tag" > "$fname"  ## tagline
            printf "%s\n" "$line" >> "$fname"       ## write Scenario line
        fi  ## write normal lines & update tagline
        [ "${line:0:1}" = " " ] && printf "%s\n" "$line" >> "$fname"
        [ "${line:0:1}" = "@" ] && tag="$line" || tag=
    done < "$1"
    return 0
}

writeToTestFile "$1"

（注意： File_X.txt在被写入之前被截断，根据需要进行调整。如果有一条线（标记线以外的线）以'@'开头，您可以进一步锚定与"${line:0:4}" = "@Tag"）

的比较

输入文件

$ cat tagfile.txt
@Tag
Scenario 1:
   Do thing 1
   Do thing 2
Scenario 2:
   Do thing 1
   Do thing 3
@Tag2
Scenario 3:
   Do thing 1
   Don't do thing 4

使用/输出

$ bash tags.sh tagfile.txt

检查输出文件：

$ cat File_1.txt @Tag Scenario 1: Do thing 1 Do thing 2 $ cat File_2.txt Scenario 2: Do thing 1 Do thing 3 $ cat File_3.txt @Tag2 Scenario 3: Do thing 1 Don't do thing 4

仔细看看，如果您有任何问题，请告诉我。

Answer 2

我使用awk：

awk -v MATCH="Scenario 1" '
        !/^[[:space:]]/  {show=0}
        $0==MATCH        {print prev; show=1}
        show             {print}
                         {prev=$0}
    '  input_file

对开始和结束捕获的格式做了几个假设;你可能需要调整前两个条件。

根据您现有的bash脚本找到类似的解决方案会很容易。但是查看现有的bash脚本会很有用。

Answer 3

Perl版本：

#!/usr/bin/perl
my $i = 0, $t= 0, $fh = 0;

while (<>) {
  if ((/^Scenario/ && !$t) || ($t = /^@\w+$/)) {
    close($fh) if $fh;
    open($fh, '>', "File".++$i.".txt") or die;
  }
  print $fh $_;
}

close($fh) if $fh;

用法：./script.pl < input.txt

Answer 4

这是一个简单的 awk 解决方案，其中包含buffer在场景之前立即保存 @Tag 记录的想法记录。它还为其相应的 Scenario 记录的输出形成一个给定的filename。不属于 Scenario 的记录将被丢弃：

#! /usr/bin/awk -f
BEGIN {
        buffer = filename = ""
}
/^@Tag/ {
        if (buffer ~ /./ && filename ~ /./)
                print buffer > filename
        buffer = $0
        next
}
/^Scenario [0-9]+:/ {
        filename=$0
        sub(/^Scenario +/, "File ", filename)
        sub(/:[ \t\r]*$/, "", filename)
}
filename ~ /./ {
        if (buffer ~ /./) {
                print buffer > filename
                buffer = ""
        }
        print > filename
}
END {
        if (buffer ~ /./ && filename ~ /./)
                print buffer > filename
}

在文件中写一行，如果匹配模式并且存在，则还要写上一行

4 个答案: