从多个文本文件中提取特定行

时间:2016-07-19 04:50:56

标签: java python perl awk sed

我想从文件夹中的多个文本文件中打印某些行,具体取决于文件名。请考虑以下由下划线分隔的3个单词命名的以下文本文件:

Small_Apple_Red.txt
Small_Orange_Yellow.txt
Large_Apple_Green.txt
Large_Orange_Green.txt

如何实现以下目标?

if (first word of file name is "Small") {
   // Print row 3 column 2 of the file (space delimited);
}

if (second word of file name is "Orange") {
   // print row 1 column 4 of the file;
}

这是否可以使用awk?

4 个答案:

答案 0 :(得分:0)

请尝试以下操作。

使用glob来处理文件夹中的文件。

然后使用正则表达式检查文件名。这里 grep用于从文件中提取特定内容。

my $path = "folderpath";
while (my $file = glob("$path/*"))
{
    if($file =~/\/Small_Apple/)
    {
        open my $fh, "<", "$file";
        print grep{/content what you want/ } <$fh>;
    }

}

答案 1 :(得分:0)

use strict;
use warnings;

my @file_names = ("Small_Apple_Red.txt",
                  "Small_Orange_Yellow.txt",
                  "Large_Apple_Green.txt",
                  "Large_Orange_Green.txt");

foreach my $file ( @file_names) {
    if ( $file =~ /^Small/){ // "^" marks the begining of the string
         print "\n $file has the first word small";
    }
    elsif ( $file =~ /.*?_Orange/){  // .*? is non-greedy, this means that it matches anything<br>
                                  //  until the first "_" is found
        print "\n $file has the second word orange";
    }
}

还有一种特殊情况,你的文件有“Small_Orange”你必须决定哪个更重要。如果第二个字更重要,请使用if部分中的内容切换elsif部分的内容

答案 2 :(得分:0)

在Awk:

<% if (typeof user == 'object' && user) { %>

<% } %>
Perl中的

awk 'FILENAME ~ /^Large/ {print $1,$4}
     FILENAME ~ /^Small/ {print $3,$2}' *

答案 3 :(得分:0)

试试这个:

use strict;
use warnings;
use Cwd;
use File::Basename;

my $dir = getcwd(); #or shift the input values from the user 
my @txtfiles = glob("$dir/*.txt");

foreach my $each_txt_file (@txtfiles)
{
    open(DATA, $each_txt_file) || die "reason: $!";
    my @allLines = <DATA>;
    (my $removeExt = $each_txt_file)=~s/\.txt$//g;
    my($word1, $word2, $word3) = split/\_/, basename $removeExt; #Select the file name with matching case
    if($word1=~m/small/i) #Select your match case
    {
        my @split_space = "";
        my @allrows = split /\n/, $allLines[1]; #Mentioned the row number
        my @allcolns = split /\s/, $allrows[0]; 
        print "\n", $allcolns[1]; #Mentioned the column number
    }
}